...

Shape, Scale, and Standardize Data

Learn to convert data types, standardize values, and apply essential transformation tools for robust analysis.

We'll cover the following...

Flipping the data
Creating new columns for context
Standardizing numeric data
- What is feature scaling?
- Scaling approaches in Google Sheets
  - Normalization (min-max scaling)
    - When do we use normalization?
  - Standardization (z-score scaling)
Converting and standardizing data types
- Text-numbers to actual numbers
  - Example
- Explicit conversions
Wrap up

Sometimes, the data we receive isn’t laid out in the most analytically friendly way, or its underlying data types prevent us from performing accurate calculations. Moreover, when we compare numbers that are on vastly different scales (like product prices vs. customer ratings), we need a way to make those comparisons fair and prevent larger numbers from unfairly dominating our analysis.

This lesson will equip us with the powerful tools to restructure our information, ensure data integrity through proper type conversion, and standardize values for truly insightful analysis. Let’s begin, and elevate our data transformation skills.

Flipping the data

Have we ever looked at a table where the headers were in rows and the data flowed down columns, but we needed the opposite for our analysis or a specific chart? Data orientation can often be confusing in that way.

The TRANSPOSE(array_or_range) function is a fantastic solution that allows us to quickly and easily swap rows and columns. Every row in our selected range becomes a column, and every column becomes a row. This is invaluable for changing data orientation to suit different analytical tools, reporting needs, or simply to make the data easier to work with in a new layout.

Suppose our sales data currently has product names as column headers and dates as rows.

See how effortlessly it flips the entire table’s orientation? With just one formula, we can reshape our data to match our analysis needs. Now that the structure is in place, we’re ready to create new columns that add meaning, context, and analytical value.

Creating new columns for context

As data analysts, our work often goes beyond just cleaning existing data. A general and incredibly powerful strategy is to derive new information from existing columns by creating new calculated columns. This process adds valuable context, new metrics, or simplifies complex calculations, ultimately making our data much more insightful.

Here are practical ways we might apply this strategy.

Calculate profit: Create a new column with =Revenue - Cost to find profit for each transaction.
Categorize age groups: Using conditional logic, group customers by age as in =IF(B2 < 18, "Minor", IF(B2 < 65, "Adult", "Senior")).
Extract year from dates: Pull out the year to analyze time-based trends such as =YEAR(C2).

With new columns in place, our dataset is more informative, but we’re not done yet. Next, we need to make sure that differences in scale don’t distort our results. Let’s explore how to standardize our data.

Standardizing numeric data

Suppose, in a sample sales dataset, our Salary values might range from $20,000 to $200,000, while Customer Age falls between 18 and 90. Without adjusting for these scale differences, even basic analysis, like plotting relationships or comparing averages across these metrics, can lead us astray. This is because the larger Salary numbers would overwhelm the Age numbers. That’s where feature scaling comes in.

What is feature scaling?

Feature scaling is a data preparation technique where numeric values are transformed to exist on a consistent scale. This helps avoid distortions during analysis, especially when working with features that have very different ranges.

As data analysts, this helps us when we’re dealing with practical benefits mentioned below.

Fair comparisons: It prevents one variable from overpowering summary statistics or visualizations due to its larger scale.
Cleaner visuals: Charts like scatterplots or heatmaps become easier to interpret when features are similar in scale.
Reliable outputs: Scaled data reduces the risk of skewed patterns in aggregations, filters, and dashboards.

Scaling approaches in Google Sheets

When working with numerical data on varying scales, two common techniques help bring ...

	A	B	C	D
1	Date	Product A	Product B	Product C
2	Jan 1	100	150	200
3	Jan 2	120	160	210

	A	B	C
6	Date	Jan 1	Jan 2
7	Product A	100	120
8	Product B	150	160
9	Product C	200	210

Step into Data Analysis

Spreadsheet Essentials

Talk to the Data

Data Prep Essentials

Data Insights with Formulas

Data Insights Without Formulas

Speak Stats

Show What You Found

Conclusion