Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

pandas
python

How to index using [ ] operator in pandas

Hassaan Waqar

Pandas library in Python is used to work with dataframes which structure data in rows and columns. It is widely used in data analysis and machine learning.

Indexing is used to obtain a portion of the dataframe only. Indexing can support both ranges of rows and columns. We can index by passing in the number of rows or by mentioning column names.

Indexing by rows

To index dataframe by row numbers, we pass in the starting and ending row numbers inside the [] operator.

The syntax is as follows:

dataframe[start:end]

: is called the range operator. It defines a range between the start and endpoints. Thus, it includes all elements within.

The end value is exclusive. It is not included in the subset of the dataframe.

Indexing a dataframe from 0:5 would return rows from row number 0 to row number 4.

If the start value is not present, indexing starts from the 0th index. All values are included from the start till the end value specified.

If the end value is not present, all values from the start until the end are included.

Example

The code snippet below shows how rows can be indexed in Pandas:

import pandas as pd

# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
                'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'], 
                'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer",
                 "Yong", "Mark", "Phelps", "Khan"],
                 'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })

print("Original Dataframe")
print(df)
print('\n')
print("Indexing Dataframe")
print('\n')
print(df[2:4]) # Both ranges
print('\n')
print(df[:3]) # End value only
print('\n')
print(df[6:]) # Start value only

Indexing using column names

We can also specify column names to get entire columns. To do so, we mention column names inside the [] operator.

The syntax is as follows:

dataframe[['column1', 'column2', 'column3']]

If there is more than one column, we must enclose them within [], which indicates a collection of columns.

A single column can be indexed as follows:

dataframe["column1"]

However, this will return a series and not a dataframe. To convert it to a dataframe, we can enclose the single column inside the [] operator. The syntax will be as follows:

dataframe[["column1"]]

Example

The code snippet below shows how columns can be indexed in Pandas:

import pandas as pd

# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
                'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'], 
                'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer",
                 "Yong", "Mark", "Phelps", "Khan"],
                 'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })

print("Original Dataframe")
print(df)
print('\n')
print("Indexing Dataframe")
print('\n')
print(df["Player"]) # Single column returning a series
print('\n')
print(df[["Player"]]) # Single column returning a dataframe
print('\n')
print(df[["Player", "Rank"]]) # Multiple columns

RELATED TAGS

pandas
python

CONTRIBUTOR

Hassaan Waqar
Copyright ©2022 Educative, Inc. All rights reserved
RELATED COURSES

View all Courses

Keep Exploring