Types of Variables in Statistical Analysis

Distinguish between categorical, numerical, continuous, and discrete variables.

Specific plots and visualizations apply to certain variables. In order to see what kinds of visualizations we can create using the seaborn library, let’s discuss the different variables commonly used in statistical analysis and their associated visualizations.

Categorical variables

Categorical variables are variables that can be assigned a category from a finite set of categories. For example, when a customer is shopping online, the mode of payment option asks whether the customer will pay for their items using a credit card or cash. A customer chooses one of the options from a finite set of possibilities; the mode of payment is a categorical variable.

Let’s say a student is filling out a job application form that asks about their educational background. They must choose the relevant option from a finite set of options, such as high school diploma, Bachelor’s, Master’s, or PhD. A person’s education is an example of a categorical variable.

Similarly, if a customer visits an ice cream parlor and orders an ice cream, the ice cream flavor is an example of a categorical variable, because the customer is choosing a flavor from a finite set of available flavors. Other real-life examples of categorical variables are a car’s manufacturer, a person’s hometown, a person’s restaurant order, and so on.

Numerical variables

Numerical variables, also known as quantitative variables, express quantifiable or measurable values. They are represented in the form of numbers. Another feature that distinguishes them from other data types is that numerical variables can be used to perform arithmetic operations.

For example, a person’s age is a numerical variable we can quantify. We can also perform arithmetic operations, such as calculating the average age of the whole class by adding all of the students’ ages together, and then dividing by the total number of pupils.

Likewise, an employee’s wage is a numerical variable that can be measured and upon which we can perform mathematical operations, such as adding up all the employees’ salaries of a specific department to calculate an annual budget to be spent on employees’ wages.

We’re surrounded by many examples of numerical variables in our daily life as well, such as the number of phones in our house, the number of siblings we have, the number of snacks in our office, and the number of people in our class or office.

Numerical variables can be further divided into the following two categories:

  • Continuous variables
  • Discrete variables

Continuous variables

A continuous variable is a numerical variable that can take any value from a given range of values, and has infinite possible values within the given range. Its value can’t be counted but is measured instead. The values can be described as intervals on a real number line.

For example, we’ve found the value of pi (π) several times while solving mathematical problems. In some cases, we consider its value as π = 3.14.

However, to be more precise, sometimes we use π = 3.1415 or even π = 3.14159265. Its value does not end here. Its value can also be π = 3.14159265358979, and so on. The values after the decimal point never end. Pi (π) is an irrational number; Its value is continuous. We can keep considering the digits after the decimal point; it has infinite values.

Therefore, we encourage you to recall the value of pi (π) if you’re ever unsure about whether a particular variable is continuous or not.

Let’s say you and your classmates are tasked to calculate the area of your classroom. John completes the task early and reports the area as 498.13 sq. ft. Later, Sarah reports the area as 498.14 sq. ft. Finally, you say the area is 498.145 sq. ft. Who do you think reported the correct area among all of you? Pause for a second and think!

Everyone is right—why is that?

Because the area is a continuous (it’s measurable), everyone can get a slightly different observation, which means everyone was correct.

Discrete variables

Discrete variables can be counted and have a set of values that can be listed. Therefore, we can always quantify discrete variables, which means a discrete variable has a countable number of possible values. For example, the number of people who attended a concert could be 400 people, but there could never be 400.1 people who attended the show.

Likewise, the number of cars in the parking lot is an example of a discrete variable because they can be counted. It may take a bit longer if the area is large, but still, the value is countable. Other examples may include the number of places people visit during university tours, the number of apps on someone’s smartphone, the number of subjects one studies at university, and so on.

These four variables (categorical, numerical, continuous, and discrete) are the basis for performing data analysis in statistics because they help us to understand how a particular variable may behave. In addition, understanding the differences and characteristics of these variables assists us in discerning various patterns within data.