Visualizing Outliers
Explore techniques to identify outliers in datasets through histograms, box plots, and scatter plots. Understand how to analyze their trends and relationships within data, and learn strategies for deciding whether to remove, transform, or keep outliers to create accurate and insightful data narratives.
Outliers are data points that are notably different from the main body/group of samples in our dataset. They can be found in many real-world datasets. We can see an example of an outlier in the below plot, where the outliers are data points in the 700–1000 range that are very different from the other data points in the 0–300 range.
Identifying the context around outliers can help add interesting insights to narratives and help data scientists make decisions about how to handle outliers.
Let's explore three steps toward implementing solutions for outliers for data storytelling:
Identifying and visualizing outliers
Identifying trends and relationships of outliers and other data points
Resolving or keeping outliers
Context of the data
We will be looking at the Tips dataset, composed of information one waiter collected about tips they received working in a restaurant over a few months.
The variables in the dataset include:
total_bill: The total bill in dollarstip: The total tip ...