Finding Correlation Between the Movie Ratings
Explore how to calculate correlations between different movie ratings using Python. Learn to process rating data stored as dictionaries, handle missing reviews, apply safe evaluation with literal_eval, and use NumPy to compute correlation coefficients. This lesson shows how to generate correlation dictionaries for movies that support recommendation engine development.
We'll cover the following...
We’ve generated some random data for a few movie ratings. Let’s have a look at it.
The movie data is stored as a dictionary. Each dictionary has its sub-dictionary. Let’s look at the first movie:
'Terminator': {'Tom': 4.0,
'Jack': 1.5,
'Lisa': 3.0,
'Sally': 2.0},
The movie Terminator has been rated by four people: Tom has given it a score of 4.0, while Jack has given it 1.5, and so on. These numbers are random.
We will notice that not everyone has rated every movie. This is something we will need to take into account when we are calculating the correlation.
Let’s see how we can calculate the correlation for the following:
if len(sys.argv) < 2:
print("Usage: python calc_correlation.py <data file.py>")
exit(1)
We want to give the script a data file to calculate the correlation on. If the file is not provided, we’ll print the usage and exit.
with open(sys.argv[1], 'r') as f:
temp = f.read()
movies_list = ast.literal_eval(temp)
print(movies_list)
Usually, when we open a file, we have to close it to deal with any errors. The with function does all that for us. It will open the file, close it at the end, and handle any errors that may arise.
We’re looking at the code line by line now.
with open(sys.argv[1], 'r') as f:
We’ll open the file passed, which is the first argument, as read-only.
temp = ...