Overview of Dataset
Become familiar with the dataset used in this course.
We'll cover the following...
We'll cover the following...
Structure of product dataset
First, let’s take a look at the metadata of the Amazon “Toys and Games” dataset:
Structure of product dataset in JSON format
Here is the explanation of the metadata of the “Toys and Games” dataset.
| Column Name | Column Detail |
|---|---|
asin |
ID of the product, for example, 0000031852 |
title |
Name of the product |
feature |
Features of the product in bullet point format |
description |
Description of the product |
price |
Price in US dollars (at the time of crawl) |
imageURL |
URL of the product image |
imageURLHighRes |
URL of the high-resolution product image |
related |
Related products (also bought, also viewed, bought together, buy after viewing) |
salesRank |
Sales rank information |
brand |
Brand name |
categories |
List of categories the product belongs to |
Structure of review dataset
We’ll use the Amazon review dataset on “Toys and Games”. Its details can be found under the “Amazon Review Data (2018)” lesson in the Appendix.
Structure of review dataset in JSON format
Below is the explanation of the “Toys and Games” review dataset:
| Column Name | Column Detail |
|---|---|
image |
Images that users post after they have received the product |
overall |
Rating of the product |
vote |
Helpful votes of the review |
reviewTime |
Time of the review (raw) |
reviewerID |
ID of the reviewer, e.g., AUI6WTTT0QZYS |
asin |
ID of the product, e.g., 5120053084 |
style |
A dictionary of the product metadata, e.g., “Size” is “Large”) |
reviewerName |
Name of the reviewer |
reviewText |
Text of the review |
summary |
Summary of the review |
unixReviewTime |
Time of the review (unix time) |
Extract the dataset
The dataset is downloaded in the zipped format. We need to unzip it using the following command:
Try the command in the terminal
- Copy the above command in the terminal to execute it.
- Run
lscommand after this to check the extracted file.