pandas loc vs iloc
Pandas is a Python library used widely in the field of data science and machine learning. It helps manipulate and prepare numerical data to pass to the machine learning models. Pandas provides us with loc and iloc functions to select rows and columns from a pandas DataFrame.
In this Answer, we will look into the ways we can use both of the functions to select data from a DataFrame and highlight the key differences between them.
loc vs iloc
In this section, we will explore a side-by-side difference between both functions.
loc[] | iloc[] | |
Abbreviation |
|
|
Definition | Selects rows and columns based on the labels (indexes and name). | Select rows and columns based on index numbers rather than column labels. |
Single value | df.loc["A"] OR df.loc[1] | df.iloc[1] |
List | df.loc[["A", "B", "E"]] | df.iloc[[1, 2, 4]] |
Slicing | df.loc["A" : "E", "W" : "Z"] *includes both labels (int or str) of the slice operator df.loc[included : included] | df.iloc[1 : 5 , 1 : 3] *includes the first but excludes the second label (int) of the slice operator df.iloc[included:excluded] |
Condition | df.loc[condition] | Cannot apply condition in the operator. |
Now that we have a clear understanding, let's see an illustration of selecting a DataFrame row using both functions.
In the illustration above, we access the second row of the DataFrame using the loc and iloc functions. Using the iloc function, we provide the position of the row i.e., 1 in the square brackets [] of the function, whereas for the loc function we provide the label of the row i.e., "B".
Example data
Throughout the Answer, we will be using the following example data to apply the loc and iloc accessors in our coding examples:
Name | Age | Country |
John | 20 | USA |
James | 30 | Canada |
Alex | 23 | Brazil |
Sara | 13 | Argentina |
Andrew | 42 | Australia |
Albert | 12 | England |
Coding examples : iloc
iloc becomes useful when we want to access data using the numerical position of the rows and columns. Below, we will see some of the use cases where we can use the iloc function:
Accessing a specific row and column
The syntax to access columns and rows from a DataFrame is:
DataFrame.iloc[ row , column ]
In the syntax above, we specify the DataFrame from which we want to access the rows and columns.
row: We insert the row number that is to be accessed. The row number of a DataFrame starts from the 0 index.column: We insert the column number that is to be accessed. The column number of a DataFrame also starts from the 0 index.
Below is a code example to access specific rows and columns from a DataFrame.
import pandas as pd
person_data = pd.DataFrame({
"Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"],
"Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"],
"Age": [12, 14, 20, 25, 29, 45, 32, 60]
})
print(person_data)
print("Accessing 3rd row of the DataFrame:")
print(person_data.iloc[3])
print("######################################")
print("Accessing 3rd row and 1st column of the DataFrame:")
print(person_data.iloc[3 , 0])
print("######################################")
print("Accessing 3rd,5th and 6th rows and 1st column of the DataFrame:")
print(person_data.iloc[[3,5,6]])Code example
Line 1: We import the
pandaslibrary so that we can create a DataFrame and apply theilocfunction on it.Line 2: We create a DataFrame and store it in the
person_datavariable.Line 9: We print the complete DataFrame.
Line 12: We access row 3 of the DataFrame.
Line 16: We access row 3 and column 0 of the DataFrame.
Line 20: We access multiple rows by passing in a list of indices
[3,5,6].
Accessing a range of rows and columns
To access a range of columns and rows, we use a colon : operator in the rows and columns. The syntax is given below:
DataFrame.iloc[rowstart : rowend , colstart : colend]
rowstart: The starting index of the row range from where we want to access the rows.rowend: The ending index of the row range. The ending row index is not accessed; rather, the rows tillrowend-1indices are accessed.colstart: The starting index of the column range from where we want to access the columns.colend: The ending index of the column range. The ending column index is not accessed. Rather, the columns tillcolend-1indices are accessed.
Below is a coding example to access a range of rows and columns.
import pandas as pd
person_data = pd.DataFrame({
"Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"],
"Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"],
"Age": [12, 14, 20, 25, 29, 45, 32, 60]
})
print("Accessing range of rows from the DataFrame:")
print(person_data.iloc[3:5])
print("######################################")
print("Accessing range of rows and columns from the DataFrame:")
print(person_data.iloc[3:5 , 0:2])Code explanation
Line 10: We select a range of rows starting from index 3, ending till index 4 =
colend - 1Line 14: We select a range of rows and columns.
Coding examples: loc
loc function is handy when accessing rows and columns based on meaningful labels rather than integer positions. Below, we will see some of the use cases where we can use the loc function:
Accessing specific rows and columns
The syntax to access rows and columns is:
DataFrame.loc[rowlabel , collabel]
rowlabel: We define the row label we want to select from the DataFrame.
collabel: We define the column label we want to select from the DataFrame.
An example code for selecting specific rows and columns is given below:
import pandas as pd
person_data = pd.DataFrame({
"Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"],
"Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"],
"Age": [12, 14, 20, 25, 29, 45, 32, 60]
}, index=["A" ,"B", "C", "D", "E", "F", "G", "H"])
print("Accessing row with label B:")
print(person_data.loc["B"])
print("######################################")
print("Accessing row with label B and column with label Country:")
print(person_data.loc["B" , "Country"])
print("######################################")
print("Accessing A, B and C labeled rows and column with Name and Age labels:")
print(person_data.loc[["A" , "B" , "D"] , ["Name" , "Age"]])Code explanation
Line 7: We give custom string indices to the DataFrame rows using the
indexparameter.Line 10: We access the row with the label
B.Line 14: We access the cell containing the row with the label
Band the column with the labelCountry.Line 18: We access specific rows and columns by passing in the label names in a list.
Accessing a range of rows and columns
To select a range of rows and columns using the loc function, we use column and row splitting using the colon : operator similar to the iloc function.
The coding example is given below:
import pandas as pd
person_data = pd.DataFrame({
"Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"],
"Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"],
"Age": [12, 14, 20, 25, 29, 45, 32, 60]
}, index=["A" ,"B", "C", "D", "E", "F", "G", "H"])
print("Accessing range of rows from the DataFrame:")
print(person_data.loc["B" :"E"])
print("######################################")
print("Accessing range of rows and columns from the DataFrame:")
print(person_data.loc["B":"E" , "Name":"Country"])Code explanation
Line 10: We pass a range of rows, starting from the row with the label
Band ending at the row with the labelE.Line 14: We pass a range of rows and columns.
Applying a filter on a DataFrame
To apply a filter on a DataFrame, we can pass in a condition in the bracket [] of the loc function. To explain this, we can see a coding example below:
import pandas as pd
person_data = pd.DataFrame({
"Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"],
"Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"],
"Age": [12, 14, 20, 25, 29, 45, 32, 60]
}, index=["A" ,"B", "C", "D", "E", "F", "G", "H"])
print(person_data)
print("Applying filter of person's age greater than 40")
print(person_data.loc[person_data["Age"] > 40])Code explanation
Line 12: We pass in the filter,
person_data["Age"] > 40inside the brackets[]. Each value of the column with the labelAgeis checked. If the condition is true, then the value is selected.
Conclusion
loc and iloc are powerful data selection tools that pandas DataFrame provides. We use loc for accessing labeled data, whereas we use iloc to access data based on position.
Free Resources