To work efficiently, we need error-free and non-corrupted data. To achieve data cleaning, we need the pandas
library. To start using pandas, we first import it:
import pandas as pd
The next step is to import the .csv
file:
data = pd.read_csv('./filename.csv')
#importing module import pandas as pd #importing the dataset by reading the csv file data = pd.read_csv('./data.csv') #displaying the first five rows of dataset data.head()
We run the Jupyter Notebook below and verify the above code by running the helloworld.ipynb
file:
import React from 'react'; require('./style.css'); import ReactDOM from 'react-dom'; import App from './app.js'; ReactDOM.render( <App />, document.getElementById('root') );
There are five functions that are helpful to locate and fill the missing data if present in the dataset:
data.isnull()
data.isna()
data.isna().any()
data.isna().sum()
data.isna().any().sum()
data.isnull()
function: It gives the boolean value for the complete dataset to check if there is any null value is present or not.
data.isna()
function: It is the same as the isnull()
function.
data.isna().any()
function: It also gives a boolean value if any null value is present or not, but it gives results column-wise, not in tabular form.
data.isna().sum()
function: It gives sum of all the null values which are null column wise.
data.isna().any().sum()
function: It gives output in a single value if any null is present or not.
fillna()
functionAfter we locate the Null
or NaN
values in our dataset the next step is to fill those places with some other values. For this purpose, we can use fillna()
function of DataFrame
:
DataFrame_name.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
This function fills NA/NaN
or 0
values in place of null spaces.
Let’s discuss the arguments which are passed through the fillna()
function:
value
:
Value to use to fill holes (places with null
or NaN
) (For example, 0
). This value cannot be a list.
method
: {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’}
method to fill holes in reindexed series:
pad
/ ffill
: Propagate last valid observation forward to next valid.backfill
/ bfill
: Use next valid observation to fill the gap.axis
: {0
or ‘index’, 1
or columns
}
axis along which to fill missing values.
Inplace
: If true
, this fills in our DataFrame
in place, there is no copy, and our old DataFrame
is overwritten.
limit
: int
, default None.
This is the maximum number of consecutive NaN
values to forward or backward fill if the method is specified.
downcast
: We can set it to infer
to get a dtype=int64
.
RELATED TAGS
CONTRIBUTOR
View all Courses