Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

data science

What are visualizations in data science?

Hassaan Waqar

A variety of data exists in data science. Some data is in the form of numbers, while some exist as categories. Different types of data need to be represented differently.

Types of data

Data can be broadly classified into two main categories: quantitative data and qualitative data.

Quantitative data exists in the form of quantities or numbers. This includes the population of a country, the weight of a person, or the number of days in a week.

Qualitative data exists as categories. It is non-numerical in nature. This includes categories of gender, daily weather, or types of degrees in a university.

Both qualitative and quantitative data can be sub-divided into further categories.

Types of quantitative data

Quantitative data can be sub-divided into two main categories: discrete and continuous data.

Discrete data refers to data in whole numbers. They can take certain fixed values only. These include the number of days in a month, the age of a person, or the number of siblings.

Continuous data spans a range of values. It is not fixed and can have decimal numbers as well. This includes the GPA of a student or the speed of a car.

Types of qualitative data

Qualitative data can be sub-divided into two main categories: nominal and ordinal data.

Nominal data does not have an order amongst it. It cannot be ranked in any way. This includes categories of gender or race.

Ordinal data has some order within it. It can be ranked from high to low, good to bad, or vice-versa. This includes levels of education in school, survey responses on a Likert scale, or yelp ratings.

The illustration below summarizes types of data:

Classification of data

Data can be represented using visualizations. Visualizations help in providing an overview of the data along with summary statistics. Different types of data represent using different visualizations.

We will discuss some prominent visualizations below:

Bar Chart

A bar chart can be used to represent the counts of qualitative data. It can also be used to represent quantitative data if it belongs to some category.

A bar chart has categories on the x-axis and counts or values on the y-axis. It is used to compare different values, items, and categories of data.

For example, bar charts can show the number of students of different genders in a university. Genders will be on the x-axis as categories. Counts will be on the y-axis. It can also be used to show voting results for a particular questionnaire, as shown on the right.

Bar Chart

Pie chart

A pie chart is used to represent proportions of different categories of qualitative data. A pie (circle) is divided into different segments where each segment represents a category. The size of the segment is based on the proportion of actual data.

Pie charts show what percentage of the whole is made up of each category. It is used to indicate the spread of data.

Pie charts can be used to represent the percentage of male and female students in a class. It can also be used to show proportion of responses in a survey questionnaire, as shown on the right.

Pie Chart

Histogram

A histogram is used to represent quantitative continuous data. It represents a distribution, which means the total proportion of columns equals the total number of values in the data. The figure on the right shows the distribution of heights of students. We can count the number of students by taking the sum of counts of each column.

Since histograms represent quantitative continuous data, data exists as ranges. Each column has a lower bound and an upper bound. For example, the figure on the right shows height within a range of 5 cm. The length of each column shows the scaled value occupied by each range.

A histogram can be used to show the heights or weights of a group of students.

Histogram

Scatter plot

A scatter plot is used to represent quantitative data. It is used to show a trend.

A scatter plot consists of two variables. It shows the trend of the second variable when the first variable increases. Similarly, it can be used to show the trend over time. In this case, time is our first variable. Each circle represents a subject.

A scatter plot can be used to show the population growth with time or the trend of units sold with revenue.

Scatter plot

Box plot

A box plot is used to highlight summary statistics of quantitative data. A box plot shows the percentiles, median, and outliers in a data set.

Box plot

Outliers refer to anomalies in data. They can be caused by incorrect measurement or recording of data values.

For example, box plots can analyze summary statistics of baby weights, heights of trees, or heartbeat rates.

RELATED TAGS

data science

CONTRIBUTOR

Hassaan Waqar
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring