Introduction to Datasets
Understand the characteristics of common machine learning datasets used in practice. Learn about diverse dataset types such as advertising behavior, real estate prices, Airbnb listings, and crowdfunding projects. This lesson helps you recognize typical features and data quality issues like missing values, preparing you to apply these datasets in coding machine learning models.
Overview of advertising data
This dataset contains fabricated information about the features of users responding to online advertisements, including their gender, age, location, daily time spent online, and whether they clicked on the target advertisement.
Note: It has ten features with no missing values. It’s available on Kaggle.
Overview of Melbourne housing market data
This dataset contains data on house, unit, and townhouse prices in Melbourne, Australia. This dataset is composed of data scraped from publicly available real estate listings posted weekly on the domain. The full dataset contains twenty-one variables, including address, suburb, land size, number of rooms, price, longitude, latitude, postcode, etc.
Note: This dataset has twenty-one features with some missing values. It’s available on Kaggle.
Overview of Berlin Airbnb data
Airbnb has exploded in popularity following its humble beginnings in 2008. Today, Berlin is one of the biggest markets for alternative accommodation in Europe, with over 22,552 Airbnb listings recorded in November 2018. The dataset contains detailed data including location, price, and reviews.
Note: It has sixteen features with some missing values. It’s available on Kaggle.
Overview of Kickstarter data
Kickstarter is the world’s largest crowd-funding platform for creative projects. This dataset was created using data extracted from the Kickstarter website.
Note: It has thirty-five features with some missing values. It’s available on Kaggle.