...

/

Shape It Right

Shape It Right

Learn to reshape and prepare structured data for clear, analysis-ready insights using pandas.

As data analysts, we rely on structure to make sense of information. Imagine trying to analyze survey responses where each answer is stored in a separate file, or trying to compare monthly sales when each month is its own column. That kind of clutter makes it nearly impossible to run clean comparisons or build effective visuals.

That’s where data reshaping comes in. Reshaping is about turning scattered, inconsistent structures into tidy, streamlined tables. This means each row is an observation, each column a variable, and every piece of data fits into place.

In this lesson, we’ll unpack what tidy data really means, explore wide vs. long formats, and get hands-on with pandas tools like melt(), pivot(), pivot_table(), stack(), and unstack(), so we can reshape any DataFrame to suit our analysis.

What is tidy data?

“Tidy” sounds like a colloquial term, right? In technical terms, however, tidy data follows three simple rules:

  1. Each variable forms a column. Every distinct attribute or measurement is stored in its own column. For example, in a student dataset, Name, Subject, and Score should each be in separate columns, not combined into one.

Name

Subject

Score

Alice

Math

89

Bob

Math

77

  1. Each observation forms a row. Each row represents one complete set of measurements or attributes for a single entity or event. For example, a single row for “Alice’s Math score” means that her Name, Subject, and Score are all in one row.

Name

Subject

Score

Alice

Math

89

Alice

Science

90

  1. Each type of observational unit forms a table. Different entities or observational types should be stored in separate tables to avoid mixing unrelated data. For example, use one table for student scores:

Name

Subject

Score

Alice

Math

89

And a separate table for teacher information:

Name

Subject

Room

Mr. John

Math

101

This consistent and predictable structure is essential because many data manipulation and visualization tools expect data to be tidy. When data is tidy, we can easily apply filters, groupings, summaries, and charts without complicated reshaping.

Wide format vs. long format

Understanding how our data is structured is key to effective analysis and visualization. Two common data shapes, namely wide and long formats, determine how we organize variables and observations. Knowing the difference between the two helps us decide when, and how to reshape the data for different tasks.

Wide format: In wide format, similar measurements are spread across multiple columns. This can be convenient for quick human inspection, but is harder to automate.

Press + to interact
Wide format of the data
Wide format of the data

For example, monthly sales might be stored in separate columns like Jan_Sales, Feb_Sales, ...