Search⌘ K
AI Features

Exploratory Data Analysis

Explore how to perform exploratory data analysis on airline fare datasets, including data cleaning, visualization, and descriptive statistics. Understand numerical and categorical features, analyze relationships between variables like flight duration and fare, and prepare data for regression modeling using H2O.

An EDA exercise includes data cleaning, visualization, descriptive statistics, and hypothesis testing. With EDA we analyze and summarize the main characteristics of a dataset. Our goal is to gain insights into the underlying relationships, which is a crucial step before building predictive models.

In this lesson, we’ll perform our EDA on the airline fares dataset. The dataset includes flights operating between various Indian cities and their fares. A flight from Kolkata to Bangalore may sometimes provide service to Hyderabad as well. There are instances where, even on the same route, the fare might fluctuate based on the booking date, and we have multiple records in our dataset for such cases, as we can see below for flight 6E-148:

Python 3.8
flight_number = "6E-148"
data[data.flight==flight_number]

From the above output, we can see that:

  • Fares for the same flight vary greatly (2,482, 7,420) just on the basis of the number of days before departure.

  • The dataset consists of a mix of numerical features (duration, days_left) and categorical features (airline, flight, source_city, departure_time, stops, arrival_time, destination_city, class). ...

Insights on