Tidy Up Text Data
Learn to transform messy text into perfectly organized data for analysis.
Text data is all around us, in customer names, product descriptions, cities, email addresses, and more. But unlike numbers, text often arrives messy: inconsistent capitalization, extra spaces, unexpected symbols, or mismatched formats. Left unchecked, this can break formulas, ruin lookups, and lead to frustrating analysis errors.
As analysts, clean, consistent text is our foundation for reliable insights. Whether we’re grouping products, extracting area codes, or standardizing formats, we need tools that can quickly shape raw text into usable data.
In this lesson, we’ll explore essential text functions in Google Sheets to clean up entries, format data consistently, and extract exactly what we need from each text string.
Let’s get started with the basics: removing extra spaces.
Removing unwanted spaces
Have you ever copied data from a website or an old report, and when you pasted it, you noticed strange gaps or extra spaces? Raw text often contains unwanted spaces, at the start, end, or even between words. These spaces might seem harmless, but they can break formulas, cause lookup failures (because “Apple” is not the same as “Apple ” to a computer), and make your data look unprofessional.
The TRIM(text)
function is our first line of defense. It’s specifically designed to clean text by removing all leading spaces (at the beginning), all trailing spaces (at the end), and reducing multiple spaces between words to just a single space.
Suppose we have a list of customer names, some of which were entered a bit sloppily:
Notice how TRIM()
handles all those tricky spaces? It’s an effective first step for cleaning names, addresses, product titles, or any manually entered data.
Analyst tip: TRIM()
is especially useful before running lookups (like VLOOKUP
or XLOOKUP
) or filtering data. Those hidden spaces can cause mismatches, making your formulas return errors or or simply miss correct data. Precision matters as those hidden spaces can quietly derail your analysis.
Changing letter case
Just like extra spaces, inconsistent capitalization is a common way to identify messy text data. “New York”, “NEW YORK”, and “new york” all refer to the same place, but to a spreadsheet, they are three distinct entries. This can make grouping, sorting, and analyzing data a headache.
These three functions, as mentioned below, help us standardize capitalization.
UPPER(text)
: Converts all letters in thetext
string toUPPERCASE
....