Overview of Dataset
Become familiar with the dataset used in this course.
We'll cover the following...
We'll cover the following...
Structure of product dataset
First, let’s take a look at the metadata of the Amazon “Toys and Games” dataset:
{"asin": "0000031852","title": "Girls Ballet Tutu Zebra Hot Pink","feature": ["Botiquecutie Trademark exclusive Brand","Hot Pink Layered Zebra Print Tutu"],"description": "This tutu is great ...","price": 3.17,"imageURL": "http://...","imageURLHighRes": "http://...","also_buy": ["B00JHONN1S", "B002BZX8Z6", "..."],"salesRank": {"Toys and Games": 211836},"brand": "Coxlures","categories": [["Sports & Outdoors","Other Sports","Dance"]]}
Structure of product dataset in JSON format
Here is the explanation of the metadata of the “Toys and Games” dataset.
| Column Name | Column Detail | 
|---|---|
asin | 
ID of the product, for example, 0000031852 | 
title | 
Name of the product | 
feature | 
Features of the product in bullet point format | 
description | 
Description of the product | 
price | 
Price in US dollars (at the time of crawl) | 
imageURL | 
URL of the product image | 
imageURLHighRes | 
URL of the high-resolution product image | 
related | 
Related products (also bought, also viewed, bought together, buy after viewing) | 
salesRank | 
Sales rank information | 
brand | 
Brand name | 
categories | 
List of categories the product belongs to | 
Structure of review dataset
We’ll use the Amazon review dataset on “Toys and Games”. Its details can be found under the “Amazon Review Data (2018)” lesson in the Appendix.
{"image": ["https://..."],"overall": 5.0,"vote": "2","verified": True,"reviewTime": "01 1, 2018","reviewerID": "AUI6WTTT0QZYS","asin": "5120053084","style": {"Size:": "Large","Color:": "Charcoal"},"reviewerName": "Abbey","reviewText": "I now have 4 ... ","summary": "Comfy, flattering, ...!","unixReviewTime": 1514764800}
Structure of review dataset in JSON format
Below is the explanation of the “Toys and Games” review dataset:
| Column Name | Column Detail | 
|---|---|
image | 
Images that users post after they have received the product | 
overall | 
Rating of the product | 
vote | 
Helpful votes of the review | 
reviewTime | 
Time of the review (raw) | 
reviewerID | 
ID of the reviewer, e.g., AUI6WTTT0QZYS | 
asin | 
ID of the product, e.g., 5120053084 | 
style | 
A dictionary of the product metadata, e.g., “Size” is “Large”) | 
reviewerName | 
Name of the reviewer | 
reviewText | 
Text of the review | 
summary | 
Summary of the review | 
unixReviewTime | 
Time of the review (unix time) | 
Extract the dataset
The dataset is downloaded in the zipped format. We need to unzip it using the following command:
gzip -d Toys_and_Games_5.json.gz
Try the command in the terminal
- Copy the above command in the terminal to execute it.
 - Run 
lscommand after this to check the extracted file.