The more conversant you are with data, the better decisions you’ll be able to make.
But data can be complex and unruly to work with. It’s important to understand the different types of data available and use them to your advantage.
One such type is categorical data.
In this guide, we’ll dive deep into what categorical data is, its advantages, many of the use cases, and also provide examples.
Let’s get started.
What is categorical data?
Categorical data definition: Data that’s separated into multiple categories based on qualitative characteristics.
To put it another way, it’s information that has been grouped together because of qualities that it shares with other members of the group, and the information is expressed using words as opposed to numbers. For example, people can be grouped categorically based on their age, the city they live in, ethnicity, religion, etc.
It’s important to note that categorical data is grouped qualitatively and not quantitatively. That’s not to say the data you’re working with cannot have numbers. Rather, that’s not the important factor determining their grouping.
For example, you may group people based on their postal code (which is a proxy for geography). While the postal code is a number, it has no meaning beyond representing a geographic location. You cannot run statistical analysis based on the postcode and hope to get a meaningful result.
Categorical data is an umbrella term used to describe a certain type of data but it also has two subtypes. These are
- Nominal data – Information that is used to name a variable but doesn’t have an inherent numerical value or order. For example, a profession could be named scientist but there is no numerical value attached to the term scientist. Things like hair color, gender, degree program, etc. are all examples of nominal data. This is the most common type of categorical data and is sometimes used interchangeably with it.
- Ordinal data – This type of data has a specific order based on the position it occupies on a scale. With that being said, it still doesn’t have an inherent numerical value. An example of an ordinal scale would be a Likert scale with options such as strongly agree, agree, neutral, disagree, and strongly disagree. Even though the values have a clear sequence, they cannot be used for standard mathematical calculations.
Categorical data vs numerical data
In direct opposition to categorical data is numerical data. It can be defined as information or data that is expressed numerically as opposed to using words or natural language.
As the name implies, it’s collected in numerical form and remains in numerical form. Data like Age, height, weight, etc. are considered numerical and you can perform mathematic calculations like division, subtraction, averages, etc., and get meaningful results.
There are two types of numerical data:
- Discrete numerical data – This is a type of data that can be counted and has a fixed value. The elements counted can be finite or infinite. A simple example of discrete numerical data includes the number of pools in a neighborhood, the number of senators in a country, etc. Those examples are finite. The number of beaches in a country is also finite but the number of grains of sand on those beaches is unable to be completely counted so they’re considered countably infinite.
- Continuous numerical data – These are numerical elements that aren’t countable but are represented by real numbers on an interval scale. The data can take any value. The revenue a business generates is continuous but the revenue it generated in 2020 is discrete. Another example could be weight, temperature, length, etc. This data can be further divided into
- Ratio data – The scale has numbers of equal value or magnitude. There is also an absolute zero point.
- Interval data – The scale has numbers of equal value or magnitude but there is no absolute zero point. Note that these scales may have zero but that doesn’t mean there’s an absence of the value being measured. For example, a Celsius scale has a zero point but that is also a measurement.
Remember, qualitative data can sometimes be collected in numerical form but arithmetic cannot be performed on it.
Categorical data is qualitative information divided into groups. Numerical data is information that can be expressed as numbers and arithmetic performed with it.
Each one has its own advantages and disadvantages, let’s take a look at them for categorical data.
Advantages and disadvantages of categorical data
- Categorical data, by its very nature, can be grouped based on common criteria which can aid in analysis
- Allows for identifying patterns that would be missed without a deeper understanding of the data used
- Provides deeper insights into complex topics that can’t easily be uncovered with simple closed ended survey questions
- The open-ended questions used in surveys may allow respondents to lead you in a different direction which may produce a better outcome for you or your organization.
- More difficult and expensive to collect the data in the first place
- Harder to interpret than numerical data
- A lot of data may be irrelevant which requires more time to sort through it
- May produce false positives if the sample size isn’t large enough or representative of the population.
These are just a few of the advantages and disadvantages of categorical data. As you work with it more often, you’ll gain more insight into how it can be used and where it’s not as effective. Let’s take a look at a few examples of categorical data so that you’ll have a better understanding of how to apply it.
Examples of categorical data
Categorical data is all over the place if you know what to look for. Here are a few examples that will make it easier to understand what it is and isn’t.
What is your hair color?
This survey question collects nominal categorical data.
What is your ethnicity?
- African American
This question is seeking to collect nominal categorical data.
What is the highest level of education you’ve obtained?
- Less than high school
- High school
- Some post-secondary education
- Post-secondary degree, diploma, or certificate
- Postgraduate degree (master’s or doctorate)
Again, this question is geared towards collecting nominal categorical data.
What is your religion
What is your favorite sneaker brand?
How would you rate your satisfaction with our event?
- Very satisfied
- Neither satisfied nor dissatisfied
- Very dissatisfied
This question is a break from nominal data. Instead, it’s seeking to gather ordinal categorical data. The value of each option is dependent on its position in the scale. At the same time, it can’t be quantified numerically.
Common use cases of categorical data
Like other forms of data, there are many ways to take advantage of categorical data to meet your needs. Below are a few ways but, by no means, does it represent all of the use cases.
Deeper insights – Due to the nature of categorical data, one of the most important use cases is developing deeper insights about a topic or situation. Since the questions are often open-ended, the respondents can share more information. When used properly, categorical data can lead to different lines of inquiry and produce a higher ROI.
For example, if you’re presenting a customer satisfaction survey and ask someone why they were unsatisfied, they may uncover an area you had no idea about and take appropriate action.
Identifying patterns – Again, categorical data allows folks to share more about a topic. Even if you’re using observation to collect data, you’ll record it qualitatively and not quantitatively. This will allow you to review the data you have and categorize it based on multiple criteria that matter to you.
When viewing the data after it has been categorized, it can reveal patterns that you may have otherwise missed. For example, if you send out a customer research survey to figure out what their biggest problem with getting in shape was, the answers may be broadly categorized as:
- Don’t know the right foods to eat
- Don’t know what exercises to do
- Don’t have the motivation to continue after some time
- They don’t’ find it important
When you drill deeper into one of the issues like not knowing the right food to eat, you might realize that’s not the real problem. The real problem is the perception of what constitutes healthy foods and the costs to maintain a healthy diet.
Going forward, you may debunk these misconceptions in your content or create products that directly alleviate this pain paint.
Categorical data, like other forms of data you can use, is just a tool. How effective it can be is up to you. This guide has defined what it is, how it differs from other forms of data, the advantages and disadvantages, and even given examples.
It’s your turn. Start collecting qualitative data and turn it into categorical data. No matter how it’s meant to be used, you’ll create better outcomes for yourself by doing so.