Search⌘ K
AI Features

Getting Familiar with Data and Performing Data Cleaning

Explore methods to familiarize yourself with data, including understanding dataset structure, examining features, and validating data integrity. Learn the importance of data dictionaries and perform critical data cleaning steps to prepare for building predictive models.

Getting familiar with data

In your work as a data scientist, there are several possible scenarios in which you may receive such a dataset. These include the following:

  1. You created the SQL query that generated the data.

  2. A colleague wrote a SQL query for you, with your input.

  3. A colleague who knows about the data gave it to you, but without your input.

  4. You are given a dataset about which little is known.

In cases 1 and 2, your input was involved in generating/extracting the data. In these scenarios, you probably understood the business problem and then either found the data you needed with the help of a data engineer or did your own research and designed the SQL query that generated the data. Often, especially as you gain more experience in your data science role, the first step will be to meet with the business partner to understand and refine the mathematical definition of the business problem. Then, you would play a key role in defining what is in the ...