Data Types are the foundation of Exploratory Data Analysis, and yet it is often put aside and forgotten. Everyday data is captured from sources like: Websites, Cell Phones, and soon IoT. This raw data then needs to be transformed into structured data so we can apply statistical concepts and machine learning algorithms. Being able to differentiate Data Types is a must to do so, and is why in this article I propose a refresher on the topic.
Types of Structured Data
There are 2 basic types of structured data: numeric and categorical.
When we think of numeric data we of course think of numbers. Numeric data comes in 2 forms: Continuous and discrete.
Continuous data can take any value in an interval. Such data is often represented by time and money.
Discrete data can also take on values in an interval but not all of them; they can only take integer values. We can think of counts such as the number of tickets sold, or the number of people that showed up to your party.
Categorical data can only take on a specific set of values from possible choices. For example, the category of books that people read, and the flavor of ice cream that customers buy.
There are 2 important sub-types of categorical data: Binary and Ordinal.
Binary data is a Yes-No or 1-0 situation. For example, you can think of “dead or alive” and “rain or no-rain” situations.
Ordinal data is represented by categories with a rank. For example, we can think of a race podium (1,2,3).
Why Data Types are important
- A lot of R or Python packages use Data Types to improved computational speed. Some functions won’t even execute if your data types are not set properly. Clean data always has defined data types.
- Data types will also help you to choose the appropriate charts. Visualization in general is improved when data types are well defined.
- In the case of a categorical variable you can limit an input to the available categories to avoid errors.
Data can be of two types: Numeric and Categorical. Numeric data can either be continuous or discrete. Two important sub-types of Categorical data are Ordinal (rank) and binary (1-0). The next time you’re cleaning your data or doing EDA don’t forget about Data types!