Search⌘ K
AI Features

Reading The Data and Getting Top Rated movies

Explore how to use pandas for reading movie lens dataset files, merging multiple data sources into one DataFrame, and analyzing the data to find the top rated movies by their number of ratings.

We'll cover the following...

The data in the movie lens database will be in plaintext separated by |, it will not have column headers like Excel files. This means that we’ll have to provide our own column headers. pandas has a function that reads plaintext files, which is similar to NumPy’s read file functions that we covered in a previous lesson.

Let’s look at the code.

user_columns = ['user_id', 'age', 'sex']

We’ll declare user_columns for the three entries we want to read from u.user.

Note: u.user has more than three entries per row, but we’ll ignore the others.

users = pd.read_csv('u.user', sep='|', names=user_columns, usecols=range(3))

We’ll use the read_csv() function to read the file, even though it isn’t technically a CSV ...