...

/

Shape, Scale, and Standardize Data

Shape, Scale, and Standardize Data

Learn to convert data types, standardize values, and apply essential transformation tools for robust analysis.

Sometimes, the data we receive isn’t laid out in the most analytically friendly way, or its underlying data types prevent us from performing accurate calculations. Moreover, when we compare numbers that are on vastly different scales (like product prices vs. customer ratings), we need a way to make those comparisons fair and prevent larger numbers from unfairly dominating our analysis.

This lesson will equip us with the powerful tools to restructure our information, ensure data integrity through proper type conversion, and standardize values for truly insightful analysis. Let’s begin, and elevate our data transformation skills.

Flipping the data

Have we ever looked at a table where the headers were in rows and the data flowed down columns, but we needed the opposite for our analysis or a specific chart? Data orientation can often be confusing in that way.

The TRANSPOSE(array_or_range) function is a fantastic solution that allows us to quickly and easily swap rows and columns. Every row in our selected range becomes a column, and every column becomes a row. This is invaluable for changing data orientation to suit different analytical tools, reporting needs, or simply to make the data easier to work with in a new layout.

Suppose our sales data currently has product names as column headers and dates as rows.


A

B

C

D

1

Date

Product A

Product B

Product C

2

Jan 1

100

150

200

3

Jan 2

120

160

210

But we need dates as column headers and products as rows for our new dashboard. We can achieve this by placing the TRANSPOSE() formula in a new, empty cell, for example, in A6: =TRANSPOSE(A1:D3). The result will dynamically populate starting from A6:


A

B

C

6

Date

Jan 1

Jan 2

7

Product A

100

120

8

Product B

150

160

9

Product C

200

210

See how effortlessly it flips the entire table’s orientation? With just one formula, we can reshape our data to match our analysis needs. Now that the structure is in place, we’re ready to create new columns that add meaning, context, and analytical value.

Creating new columns for context

As data analysts, our work often goes beyond just cleaning existing data. A general and incredibly powerful strategy is to derive new information from existing columns by creating new calculated columns. This process adds valuable context, new metrics, or simplifies complex calculations, ultimately making our data much more insightful.

Here are practical ways we might apply this strategy.

  • Calculate profit: Create a new column with =Revenue - Cost to find profit for each transaction.

  • Categorize age groups: Using conditional logic, group customers by age as in =IF(B2 < 18, "Minor", IF(B2 < 65, "Adult", "Senior")).

  • Extract year from dates: Pull out the year to analyze time-based trends such as =YEAR(C2).

With new columns in place, our dataset is more informative, but we’re not done yet. Next, we need to make sure that differences in scale don’t distort our results. Let’s explore how to standardize our data.

Standardizing numeric data

Suppose, in a sample sales dataset, our Salary values might range from $20,000 to $200,000, while Customer Age falls between 18 and 90. Without adjusting for these scale differences, even basic analysis, like plotting relationships or comparing averages across these metrics, can lead us astray. This is because the larger Salary numbers would overwhelm the Age numbers. That’s where feature scaling comes in.

What is feature scaling?

Feature scaling is a data preparation technique where numeric values are transformed to exist on a consistent scale. This helps avoid distortions during analysis, especially when working with features that have very different ranges.

As data analysts, this helps us when we’re dealing with practical benefits mentioned below.

  • Fair comparisons: It prevents one variable from overpowering summary statistics or visualizations due to its larger scale.

  • Cleaner visuals: Charts like scatterplots or heatmaps become easier to interpret when features are similar in scale.

  • Reliable outputs: Scaled data reduces the risk of skewed patterns in aggregations, filters, and dashboards.

Scaling approaches in Google Sheets

When working with numerical data on varying scales, two common techniques help bring ...