Search⌘ K

DataFrame in Pandas

Explore how to use Pandas DataFrames to manage and analyze tabular data in Python. Learn to access columns and rows, handle missing values with NaN, and assign data efficiently. This lesson builds foundational skills for data manipulation in predictive analysis.

DataFrame

A DataFrame can be described as a 2-D array or matrix. It has data stored in it just like an excel spreadsheet with multiple columns and rows. Unlike Series which only has one index, each value in a DataFrame object is associated with a row index as well as a column index.

In the following code snippet, a CSV file containing cancer registration statistics of 2017 is read in a DataFrame.

Python 3.5
import pandas as pd
df = pd.read_csv('cancer_stats.csv') # Read from csv file
print (df.head()) # This method prints the first five rows in the dataframe

It can be seen from the output that every row has a specific index already assigned to it

On Line 3, data from the cancer_stats.csv file is read using the read_csv() function of pandas, which takes the file name as its argument.

On Line 5, the head() method, without any parameters, prints the first five rows from the DataFrame object. If a number n is passed in the function like head(n) ...