Unveiling the Numbers
Learn how to summarize, explore, and understand your data in Google Sheets using descriptive statistics.
We'll cover the following...
Data doesn’t speak evenly; some details carry more weight than others. At this stage, our task is to recognize the elements that matter most within the dataset. By doing so, we are preparing ourselves for insights that are not only clear, but also actionable.
This brings us to the heart of exploratory data analysis (EDA). Having already cleaned, organized, and taken a first look at the data, we’re now ready to go deeper. EDA is where we pause to examine each variable more closely, uncover its unique characteristics, and notice patterns or unusual values that might shape our understanding.
We begin with curiosity, asking simple but powerful questions:
What does each variable reveal on its own?
Which values dominate, and which are rare?
Are there outliers or unexpected shapes in the data?
At this stage, we’re not rushing to conclusions; instead, we’re training ourselves to recognize what matters. This habit of close observation is what makes EDA such a critical step in the analytical process.
What is EDA?
Every dataset holds a story, but that story isn’t always immediately clear. Before we create charts, run summaries, or share insights, we need to understand the data’s structure, patterns, and oddities. That’s where exploratory data analysis (EDA) comes in. It’s how data analysts get familiar with the data, uncover what’s worth highlighting, and spot anything that might affect the integrity of the analysis.
Why EDA matters?
Skipping EDA is like trying to build a house without looking at the blueprints. It’s the fastest way to get flawed results. Here’s why it’s a critical step:
It builds intuition: EDA is how we develop a “feel” for the dataset. We learn the ranges of numbers, the common categories, and the overall data quality.
It spots fatal flaws early: What if 90% of a key column is missing? What if a numerical column (like
Price) is accidentally stored as text (e.g., “$1,000”)? EDA finds these “showstoppers” before we waste time modeling.It prevents bad models (GIGO): This is the “Garbage In, Garbage Out” principle. If we feed a model data with hidden outliers, biases, or errors, the model’s predictions will be unreliable and wrong.
It guides feature engineering: By finding relationships (e.g.,
AgeandIncomeseem related), EDA gives us ideas for creating new, more predictive variables for our model.
The key steps of exploratory data analysis
Let’s break down the key steps of exploratory data analysis, and what we actually do when we explore data, from summarizing distributions to spotting relationships.
Get to know the data basics: This step involves identifying the dataset’s rows (records) and columns (variables), and reviewing basic summary statistics (
MIN,MAX,COUNT) to understand its structure and contents. This is the first step in EDA, and we’ve already covered these skills in earlier lessons.Explore variables one at a time (univariate analysis): Next, we examine each variable individually. We want to understand its distribution, common values, and any oddities like outliers or missing data.
Look at relationships between variables (bivariate and multivariate analysis): Then, we study how variables interact. Are some variables correlated? Are there patterns when we group data by categories? This step uncovers ...