Pandas is one of the most popular tools for data analysis in Python. This open-source library is the backbone of many data projects and is used for data cleaning and data manipulation.
With Pandas, you gain greater control over complex data sets. It’s an essential tool in the data analysis tool belt. If you’re not using Pandas, you’re not making the most of your data.
In this post, we’ll explore a quick guide to the 35 most essential operations and commands that any Pandas user needs to know.
Let’s get right to the answers.
NA
valuesLearn the tools of the trade: Pandas, NumPy, Matplotlib, and Seaborn
import pandas as pd
Pandas is now accessible with the acronym pd
. You can also install Pandas using the built-in Python tool pip and run the following command.
$ pip install pandas
Create one-dimensional array to hold any data type. Invoke the pd.Series()
method and then pass a list of values. Pandas will default count index from 0.
series1 = pd.Series([1,2,3,4]), index=['a', 'b', 'c', 'd'])
Set the Series name
srs.name = "Insert name"
Set index name.
srs.index.name = "Index name"
Create a two-dimensional data structure with columns. Create and print a df
.
df = pd.DataFrame(
{"a" : [1 ,2, 3],
"b" : [7, 8, 9],
"c" : [10, 11, 12]}, index = [1, 2, 3])
Specify how you want to organize your DataFrame by columns.
df = pd.DataFrame(
[[1, 2, 3],
[4, 6, 8],
[10, 11, 12]],
index=[1, 2, 3],
columns=['a', 'b', 'c'])
CSV
fileOpen the CSV file, copy the data, paste it in our Notepad, and save it in the same directory that houses your Python scripts. Use read_csv
function build into Pandas and index it the way we want.
import pandas as pd
data = pd.read_csv('file.csv')
data = pd.read_csv("data.csv", index_col=0)
Call the read_excel
function to access an Excel file. Pass the name of the Excel file as an argument.
pd.read_excel('file.xlsx')
df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet2')
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
pd.read_sql("SELECT * FROM my_table;", engine)
pd.read_sql_table('my_table', engine)
pd.read_sql_query("SELECT * FROM my_table;", engine)
(read_sql()
is a convenience wrapper around read_sql_table()
and read_sql_query())
df.to_sql('myDf', engine)
Since Pandas indexes at 0, call the first element with ser[0]
.
import pandas as pd
df = pd.read_csv
df['Name'].head(10)
# get the first element
ser[0]
Use ser[:n]
to get the first elements of a Series.
import pandas as pd
df = pd.read_csv
df['Name'].head(10)
ser[:5]
Use ser[-n:]
to get the last elements of a Series.
import pandas as pd
df = pd.read_csv
df['Name'].head(10)
ser[-5:]
df.iloc[[0],[0]] 'Name'
df.iat([0],[0]) 'Name'
df.loc[[0], ['Label']] 'Name'
df.at([0], ['Label']) 'Name'
In boolean indexing, we filter data with a boolean vector.
import pandas as pd
# dictionary of lists
dict = {'name':["name1", "name2", "name3", "name4"],
'degree': ["degree1", "degree2", "degree3", "degree4"],
'score':[1, 2, 3, 4]}
df = pd.DataFrame(dict, index = [True, False, True, False])
print(df)
s.drop(['a', 'c'])
df.drop('Value', axis=1)
df['New Column'] = 0
Learn Pandas and Data Analysis without scrubbing through videos or documentation. Educative’s text-based courses are easy to skim and feature live coding environments, making learning quick and efficient.
df.columns = ['Column 1', 'Column 2', 'Column 3']
Sort Series by index labels and returns a new Series sorted by the label if inplace argument is False
, otherwise it updates the original series and returns None
.
Series.sort_index(self, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True)
df.sort_values(by='Values')
# descending order
df.sort_values(ascending = False)
Specify how you want to rank a column and add ranks.
df.rank()
df.sum()
# cumulative sum
df.cumsum()
s.sub(2)
s.add(2)
s.mul(2)
s.div(2)
df.min()
df.max()
df.idxmin()
df.idxmax()
df.mean()
df.median()
df.describe()
Now that you’re armed with the common operations and commands in Python, you can put them into practice. After all, working with real datasets is the best way to master Python and become a data analyst! There’s still a lot more to learn that we didn’t cover today such as:
To get started with these essential tools of the trade, check out Educative’s course Predictive Data Analysis for Python. You’ll get hands-on practice with industry-standard examples and become fluent in the data analysis.
Join a community of more than 1.3 million readers. A free, bi-monthly email with a roundup of Educative's top articles and coding tips.