Introduction to pandas
Explore how pandas enables efficient data manipulation with its Series and DataFrame structures. Understand core functions for loading, viewing, and summarizing data, and prepare datasets for visualization using Seaborn.
We'll cover the following...
Why pandas?
pandas is an open-source Python library that provides efficient data manipulation and analysis tools. It offers a variety of data structures and procedures along with support for various data formats. It’s built on top of Python’s NumPy library.
The following key features are why pandas is a popular and commonly used library:
- Fast, efficient, and optimal performance and support for big data.
- Support for various data formats such as CSV files, JSON, XML, and SQL databases (to name a few).
- Data cleaning and support for handling missing values.
Pandas data structures
Python’s pandas library provides support for the following two data structures:
- pandas Series
- pandas DataFrame
The pandas series
The pandas Series object is a one-dimensional labeled array. We can populate a Series object with any Python data type, such as integers, strings, floats, and so on. We can think of the Series object as a column in a spreadsheet. All Series objects are indexed by default, meaning that every Series element has an index.
We can create a Series object using an array, dictionary, lists, and scalars. As illustrated in the figure below, we can make the scores Series object by passing the student_score list to the pd.Series() function. Each element is indexed, and we can ...