Numerical variables
In this lesson, we will learn about numeric variables and their types. We will also cover some graphics to help us visualize this type of variable alongside a few simple lines of code to display them.
Definition
Numerical variables are variables where the value has numerical meaning; for example, age, the number of movies watched, IQ, salary are all represented by numbers.
Numerical variables can be classified as:
- Continuous variables
- Discrete variables
Here is an example of age variable as a numerical variable:
Continuous Variable
A variable is continuous if it can assume an infinite number of Real values within a given interval.
As an example, weight (64.2 Kg, 43.8Kg, …), distance (105.7Km, 25.5Km), as you can see, the values are real numbers, that is why it is continuous, temperature and length are all examples of continuous data.
There are various means to visualize continuous variables, to name a few:
- Box plot
- Density plot
- Scatter plot
- Histogram
Box plot
These previous plot was generated by the following code snippet:
import seaborn as snsimport numpy as npimport matplotlib.pyplot as plt# create simple random continious variablesx = np.random.normal(size=100)# plot the boxplotsns.boxplot(x, color = "g")plt.savefig('output/box.png')
Density plot
The previous density plot was generated by the following code snippet:
import seaborn as snsimport numpy as npimport matplotlib.pyplot as plt# create simple random continious variablesx = np.random.normal(size=100)# create a density plot of x variable.sns.distplot(x, hist=False, rug=True)plt.savefig('output/box.png')
Scatter plot
The previous scatter plot was generated by the following code snippet:
import seaborn as snsimport numpy as npimport matplotlib.pyplot as plt# create simple random continious variablesx = np.random.normal(size=100)y = np.random.normal(size=100)# create a scatter plot of x and y variable.sns.scatterplot(x, y, color = "g")plt.savefig('output/box.png')
Histograms
The previous histogram plot was generated by the following code snippet:
import seaborn as snsimport numpy as npimport matplotlib.pyplot as plt# create simple random continious variablesx = np.random.normal(size=100)# creates a histogram plot of x variable with red color.sns.distplot(x, kde=False, rug=True, color = "r");plt.savefig('output/box.png')
Discrete variables
A discrete variable cannot take the value of a fraction between one value and the next closest value. It only takes integer values. Examples of discrete variables include the number of registered cars(2, 4, 7), number of business locations(4, 10, 5), and number of children in a family (0, 1, 2), all of which measured as whole units (i.e., 1, 2, 3, …).
Discrete variables can be visualized using:
- Count plot
- Pie chart
Count plot
You can use the following code snippet to generate count plots:
import seaborn as snsimport numpy as npimport matplotlib.pyplot as plt# loading the titanic dataset.titanic = sns.load_dataset("titanic")# creating a count plot for the class variable.sns.countplot(x="class", data=titanic)plt.savefig('output/box.png')
Pie chart
You can always generate a pie chart using the following code snippet:
import seaborn as snsimport numpy as npimport matplotlib.pyplot as plt# loading the titanic dataset.titanic = sns.load_dataset("titanic")# getting the count of each classvalues = titanic["class"].value_counts().values# getting the labels of each class.labels = titanic["class"].value_counts().index# creating the pie chart.plt.pie(values, labels= labels, shadow=True, startangle=90)plt.savefig('output/box.png')