How to change display options in pandas
When performing data analysis tasks in pandas, we might end up in a situation where not all the data is displayed. There could be too many rows of data, and not all of the content of each column is displayed due to the limit on the column’s maximum width; or, the float values may not be displayed with the precision that we want.
There are various options in pandas that you can use to overwrite these settings to accommodate your needs. In this shot, we will look at three of the most common settings that we can modify.
Changing the maximum number of rows displayed
Often, we work with huge datasets that can contain many rows of data. While viewing this data in pandas, you might not be able to see all the rows (maybe you’ll see the top 30 rows and last 30 rows with a ... inbetween). But what if we want to view all the data?
To do this, we need to change the default setting of the maximum number of rows to be displayed. Have a look at the code snippets below to understand this better.
import pandas as pddrinks = pd.read_csv('http://bit.ly/drinksbycountry')print(drinks[['country','beer_servings']])max_rows = pd.get_option('display.max_rows')print("Maximum rows that can be displayed: ", max_rows)
Explanation:
- In the
DisplayMaxRowsDefaulttab:- In line 1, we import the required package.
- In line 3, we read the data.
- In line 4, we print the data from the two columns in the dataframe.
- We can see that
...has been printed in between the data, and incomplete data has been shown. - In line 6, we use the
get_option()function and pass the parameter asdisplay.max_rowsto see, by default, how many rows can be displayed. We see that the number is , and so we will want to change this setting.
- In the
DisplayMaxRowsCustomtab:- The code is almost the same. The only difference is in line 5, where we set the
display.max_rowstoNoneto display all the rows in the data. - Now, you can see that, in the output, all the rows are printed.
- In line 8, we use the
reset_option()to reset any setting back to its default value.
- The code is almost the same. The only difference is in line 5, where we set the
Changing the maximum column width displayed
While viewing the data, you might have observed that many columns do not print all the content in a certain cell. This is due to the maximum column width property. Let’s see how this setting can be changed.
import pandas as pdtrain = pd.read_csv('http://bit.ly/kaggletrain')print(train[['Name','Sex']])max_colwidth = pd.get_option('display.max_colwidth')print("Maximum column width is: ", max_colwidth)
Explanation:
- In the
DisplayMaxColWidthDefaulttab:- The code is almost the same as above. The only difference is the dataset that we are loading and the property name (
display.max_colwidth). - When you run the code, you can see that, in the second row, the content of the column
Nameis not displayed completely. - We can also see that the maximum column width is , which we will want to change.
- The code is almost the same as above. The only difference is the dataset that we are loading and the property name (
- In the
DisplayMaxColWidthCustomtab:- The code is almost the same. The only difference is in line 5, where we set the
display.max_colwidthto 1000. - After running this code, you can see that the problem is solved and the full data is displayed.
- The code is almost the same. The only difference is in line 5, where we set the
Changing the precision of float values displayed
Many times, there are float values in our data that we will want to display two or three digits after the decimal point. Take a look at the code snippet below to see how this problem can be solved.
import pandas as pdtrain = pd.read_csv('http://bit.ly/kaggletrain')print(train[['Name','Sex', 'Fare']])max_precision = pd.get_option('display.precision')print("Maximum precision is: ", max_precision)
Explanation:
- In the
DisplayMaxPrecisionDefaulttab:- The code is almost the same. The only difference is that we take one more column,
Fare, that contains the float values. - When we print the data in line 4, we can see that the precision is
high(there are many numbers after the decimal); so, we will need to change the default precision. - In line 6, we print the maximum precision, which comes out to be 6, meaning that it will display six digits after the decimal point.
- The code is almost the same. The only difference is that we take one more column,
- In the
DisplayMaxPrecisionCustomtab:- The code is almost the same. The only difference is in line 5, where we set the
display.precisionto 2, meaning it will only print 2 digits after the decimal point.
- The code is almost the same. The only difference is in line 5, where we set the
In this way, you can change the default display settings in pandas as per your needs.