In mathematical and statistical analysis, data is defined as a collected group of information. Information, in this case, could be anything which may be used to prove or disprove a scientific guess during an experiment.
Data collected may be age, name, a person’s opinion, type of pet, hair colour etc. Although there is no restriction to the form this data may take, it is classified into two main categories depending on its nature—namely; categorical and numerical data.
Categorical data, as the name implies, are usually grouped into a category or multiple categories. Similarly, numerical data, as the name implies, deals with number variables.
What is Categorical Data?
Categorical data is a collection of information that is divided into groups. I.e, if an organisation or agency is trying to get a biodata of its employees, the resulting data is referred to as categorical. This data is called categorical because it may be grouped according to the variables present in the biodata such as sex, state of residence, etc.
Read More: 5 Types of Biodata + [Examples & Template Format]
Categorical data can take on numerical values (such as “1” indicating Yes and “2” indicating No), but those numbers don’t have mathematical meaning. One can neither add them together nor subtract them from each other.
Types of Categorical Data
There are two types of categorical data, namely; nominal and ordinal data.
1. Nominal Data
This is a type of data used to name variables without providing any numerical value. Coined from the Latin nomenclature “Nomen” (meaning name), this data type is a subcategory of categorical data.
Nominal data is sometimes called “labelled” or “named” data. Examples of nominal data include name, hair colour, sex etc.
Mostly collected using surveys or questionnaires, this data type is descriptive, as it sometimes allows respondents the freedom to type in responses. Although this characteristic helps in arriving at better conclusions, it sometimes poses problems for researchers as they have to deal with so much irrelevant data.
Read Also: What is Nominal Data? Examples, Category Variables & Analysis
2. Ordinal Data
This is a data type with a set order or scale to it. However, this order does not have a standard scale on which the difference in variables in each scale is measured.
Although mostly classified as categorical data, it is said to exhibit both categorical and numerical data characteristics making it in between. Its classification under categorical data has to do with the fact that it exhibits more categorical data character.
Some ordinal data examples include; the Likert scale, interval scale, bug severity, customer satisfaction survey data etc. Each of these examples may have different collection and analysis techniques, but they are all ordinal data.
General Characteristics/Features of Categorical Data
- Categories
These consist of two categories of categorical data, namely; nominal data and ordinal data. Nominal data, also known as named data is the type of data used to name variables, while ordinal data is a type of data with a scale or order to it.
- Qualitativeness
Categorical data is qualitative. That is, it describes an event using a string of words rather than numbers.
- Analysis
Categorical data is analysed using mode and median distributions, where nominal data is analysed with mode while ordinal data uses both. In some cases, ordinal data may also be analysed using univariate statistics, bivariate statistics, regression applications, linear trends and classification methods.
- Graphical analysis
It can also be analysed graphically using a bar chart and pie chart. A bar chart is mostly used to analyse frequency while a pie chart analysis percentage. This is done after grouping it into a table.
- Interval scale
In the case of ordinal data, which has a given order or scale, the scale does not have a standardised interval. This is not applicable for nominal data.
- Numeric values
Although categorical data is qualitative, it may sometimes take numerical values. However, these values do not exhibit quantitative characteristics. Arithmetic operations can not be performed on them.
- Nature
Categorical data may also be classified into binary and non-binary depending on its nature. A given question with options “Yes” or “No” is classified as binary because it has two options while adding “Maybe” to the given options will make it non-binary.
Categorical Data Examples
1. Household Income: Categorical data is mostly used by businesses when investigating the spending power of their target audience, to conclude on an affordable price for their products. For example:
What is your household income?
- Below $30,001
- $30,001 – $40,000
- $40,001 – $50,000
- $50,001 and above
This is a closed ended nominal data example.
2. Education Level: The level of education of a respondent may be requested for when filling forms for job applications, admission, training etc. This is used to assess their qualification for a specific role. Consider the example below:
What is your highest level of education?
- School SAT
- High School
- BSc.
- MSc.
- PhD
This is also a closed-ended nominal data example.
3. Gender: Respondents are asked for their gender when filling out a biodata. This is mostly categorised as male or female, but may also be nonbinary. For example:
What is your gender?
- Male
- Female
This is a binary and closed-ended nominal data example.
What is your gender? (Others signify)
- Male
- Female
- Others _____
This is a nonbinary and open-closed ended nominal data example.
4. Customer satisfaction: After rendering service to customers, businesses like to get feedback from customers regarding their service to improve. For example;
Kindly rate your customer service experience with us
- Very poor
- Poor
- Neutral
- Good
- Very good
The above is an example of an ordinal data collection process. The responses have a specific order to them, listed in ascending order.
5. Brand of soaps: When doing competitive analysis research, a soap brand may want to study the popularity of its competitors among its target audience. In this case, we have something of this nature:
Which of the following soap brands are you familiar with?
- Lux
- Dove
- Olay
This is a multiple-choice nominal data collection example.
6. Hair colour
This is a key categorical data example used in profiling a respondent. Although not accurate, a person’s hair colour together with some racially prominent traits may be used to predict whether the person is black, caucasian, Hispanic, etc. For example
What is your hair colour?
- Blonde
- Brunette
- Brown
- Black
- Red
This is a closed-ended example of nominal data.
7. Surveys or Questionnaires: Online surveys are commonly used to carry out investigations on certain topics. The data gathered in some cases are categorical. For example
How many siblings do you have?
The above is an example of an open-ended nominal data collection form. The response may be quantitative but will possess qualitative properties.
8. Happiness level: This example may be used by a therapist or psychologist when examining a patient for mental illness. It is usually collected together with some important data that may affect a person’s mental health.
Rate your happiness level on a scale of 1-5.
- 1
- 2
- 3
- 4
- 5
This is an ordinal data example.
9. Motives for employees to work better: Companies who want to improve employee productivity may use this method to discover what motivates employees to work better. For example:
What motivates you to work better? (Others specify)
- Peer motivation
- Recognition
- Professional growth opportunities
- Friendly work culture
- Others _____
This is a closed open-ended nominal data collection example.
10. Motives for travelling: Travel and tourism companies ask their customers or target audience this question to inform marketing strategies.
What are your motives for travelling? (Others specify)
- Business
- Leisure
- Family
- Study
- Health
- Others _____
This is a closed open-ended nominal data collection example.
11. Interval scale: An event planning company may use an interval scale to get the demographics of attendees of a particular event. It is also used by Instagram and Facebook to give audience insights. For example:
In which of the following age bracket do you fall?
- Below 21 years
- 21 to 35 years
- 36 to 58 years
- 59 years and above
This is an example of ordinal data collection.
12. Checking account location: Some timesheet calculator tool collects real-time employee location so that employers can know which employee is at work and which one isn’t. This is also used in several other cases. For example:
When a user gives Instagram access to his/her location, it uses this data to give insights using a bar chart. E.g. 50% is from Texas, 30% from Texas and 20% from Colorado.
13. Bug severity: When software companies perform quality assurance testing to discover bugs in the software, the bugs are treated according to their severity level.
When a bug bounty hunter submits a bug to a company, it is given a severity level like critical, medium or low. This is an example of ordinal data.
14. Likert scale: A Likert scale is a point scale used by researchers to take surveys and get people’s opinions on a subject matter. Consider this example:
How will you rate the dessert served tonight?
- Very good
- Good
- Neutral
- Bad
- Very bad
This is a 5-point Likert scale, a common example of ordinal data.
15. Proficiency level: Employees measure a job applicant’s proficiency level in skills required to perform well in the job. This helps in choosing the best applicant for the job. For example;
What is your proficiency level in Excel?
- Advanced
- Intermediate
- Novice
This is a simple example of ordinal data.
Categorical Data Variables
A categorical variable is a variable type with two or more categories. Sometimes called a discrete variable, it is mainly classified into two (nominal and ordinal).
For example, if a restaurant is trying to collect data on the amount of pizza ordered in a day according to type, we regard this as categorical data. When gathering the data, the restaurant will group the number of orders according to the type of pizza (e.g. pepperoni, chicken etc.) ordered.
In this case, the type of pizza ordered is the Categorical variable. Categorical Data Variables are divided into two, namely; ordinal variable and nominal variable.
This type of categorical data variable has no intrinsic ordering to its categories. For example, marital status is a categorical variable having two categories (single and married) with no intrinsic ordering to the categories.
There are two main categories of nominal data variables, namely; matched and unmatched categories. Below are the tests carried out on each category:
Matched Category in Nominal Data Variables
- McNemar Test: This is a distribution-free test for paired nominal data (2 groups).
- Cochran’s Q Test: This is a test carried out on 3 or more groups.
Unmatched Category in Nominal Data Variables
- Fisher’s Exact Test: This test is used when the expected frequency is less than 5.
- Chi-Square Test: This test is used when the expected frequency is 5 or more.
This type of categorical variable has an intrinsic ordering to its categories. For example, when studying the severity of the bug in the software, severity is a categorical variable with ordered categories which are; critical, medium and low.
There are two main categories of ordinal data variables, namely; matched and unmatched categories. Below are the tests carried out on each category:
Matched Category in Ordinal Data Variables
- Wilcoxon signed-rank test: This is a test used to assess the differences between 2 groups of matched samples.
- Friedman 2-way ANOVA: This is used to find differences in matched sets of 3 or more groups.
Unmatched Category in Ordinal Data Variables
- Wilcoxon rank-sum test: This test is used to investigate 2 groups of independent samples.
- Kruskal-Wallis 1-way test: This is used to investigate 3 or more groups.
Uses of Categorical Data
- Job Application
When applying for jobs, employers collect both nominal and ordinal data. This includes the job seeker’s biodata and a combination of relevant skills and experience. Employers do this to determine the best candidate for the job.
- E-commerce
When placing an order for a product or service on an e-commerce website, one is required to input some details which are regarded as categorical data. The data collected in this case is nominal.
- Online Dating
Users of online dating platforms are usually required to input a set of categorical data to match them with the right person. This data may include personal information and partner preferences.
- Customer Service
Organisations or companies use this after selling their product or service to a customer. This is used to know how the customer feels about the company’s service to improve the overall customer experience.
- Surveys & Questionnaires
Categorical data is used to gather information from both online and offline surveys or questionnaires as the case may be. The type of categorical data used may differ depending on the aim of data collection.
This is a common test that is used for investigating the kind of personality traits a respondent possess. This test is used by companies for investigating whether a personality trait is compatible with the company’s work culture.
Disadvantages of Categorical Data
- There is a limit to the kind of statistical analysis that can be performed on categorical data.
- The options in categorical data do not have a standardised interval scale. Therefore, respondents are not able to effectively gauge their options before responding.
- Quantitative analysis cannot be performed on categorical data. Therefore numerical or arithmetic operations can not be performed.
What is the Best Tool For Collecting Categorical Data?
Categorical data may easily be collected through various collection techniques using Formplus form builder. This online form builder provides effective categorical data gathering and management.
Formplus not only provide easy data collection through customisable form feature but also create data analytics which helps drive easy and proper decision-making. It also contains useful statistical data analysis features, making it the best tool for collecting categorical data.
Differences Between Categorical and Numerical Data
Categorical and Numerical data are the main types of data. These data types may have the same number of subcategories, with two each, but they have many differences. These differences give them unique attributes which are equally useful in statistical analysis.
Numerical data are quantitative data types. For example, weight, temperature, height, GPA, annual income, etc. are classified under numerical or quantitative data.
In comparison, categorical data are qualitative data types. Some examples include: name, hair colour, qualification etc.
Categorical Vs Continous Data
Unlike categorical data which deals with groups and categories, Continuous data focuses on numerical values. This means continuous data are numerical variables that have an infinite number of values. This could be a number, date or time. For example, the date payment is received for a transaction.
Another difference is that categorical data might not have a logical order, like gender, hair etc. While continuous data has logical data like the duration of a video.
Conclusion
As you can see, there is a non-exhaustive list of categorical data examples which can be given to better understand the meaning and purpose of qualitative data. When working with data management, it’s crucial to clearly understand some of the main terms, including quantitative and categorical data and what their role is.
The distinction between categorical and quantitative variables is crucial for deciding which types of data analysis methods to use. The first step towards selecting the right data analysis method today is understanding categorical data.
Quantitative data are analyzed using descriptive statistics, time series, linear regression models, and much more. For categorical data, typically only graphical and descriptive methods are used.