Import Dataset

This lesson teaches the steps to import a CSV file into Python using Pandas.

A quick review of exploratory data analysis (EDA)

As a key part of data inspection, EDAExploratory Data Analysis involves summarizing the salient characteristics of your dataset in preparation for further processing and analysis.

EDA includes understanding the shape and distribution of the data, scanning for missing values, learning which features are most relevant based on correlation, and familiarizing yourself with the overall contents of the dataset. Gathering this intel helps inform algorithm selection and highlight parts of the data that require cleaning to prepare for further processing.

You can leverage a range of simple techniques to summarize data using Pandas, with additional options to visualize the data using Seaborn and Matplotlib.

Let’s begin by importing Pandas, Seaborn, and Matplotlib inline using the following code in Jupyter Notebook.

import pandas as pd
import seaborn as sns
%matplotlib inline

Note: Using the inline feature of Matplotlib, you can display plots directly below the applicable code cell within Jupyter Notebook or other frontends.

How you can import the dataset

You can import datasets from various sources, including internal and external files as well as random self-generated datasets called blobs.

The following sample dataset is an external dataset downloaded from Kaggle called the Berlin Airbnb dataset. This data was scraped from Airbnb and contains detailed accommodation listings available in Berlin including location, price, and reviews.

Get hands-on with 1200+ tech skills courses.