Generating Embeddings for Audio Files and Metadata of Songs

Explore how to generate numerical embeddings from both audio files and song metadata. Learn to convert metadata into textual descriptions for BERT embeddings and use OpenL3 for audio embeddings. This lesson prepares you to build vector-based music recommendation applications.

We'll cover the following...

Metadata embeddings with BERT embedding model
Audio embeddings with OpenL3 embedding model

Metadata embeddings with BERT embedding model

To use BERT for generating metadata embedding, we need to convert the tabular metadata information (attributes and values) of a song into a string, which we call the textual_description of the song. For example:

"The song Infinity Edge has a danceability of 0.528, energy of 0.847, loudness of -4.741, speechiness of 0.0307, acousticness of 0.00674, instrumentalness of 0.814, liveness of 0.12, valence of 0.389, tempo of 143.997."

To do this, we read the CSV file containing metadata about songs and create descriptive text for each song using its attributes and values. We add this text to the DataFrame and then save the updated DataFrame back to the CSV file.

Python 3.10.4

import pandas as pd
metadata_file_path="/content/drive/MyDrive/vector-databases-course/music-recommendation-system/dataset/reduced_80_fer2013_music_dataset_with_youtube_URLS.csv"
# Loading metadata into DataFrame
metadata_df = pd.read_csv(metadata_file_path)
# Extracting relevant numeric attribute and creating textual description using them
metadata_df['textual_description'] = metadata_df.apply(
    lambda row: f"The song {row['song_name']} has a danceability of {row['danceability']}, "
                f"energy of {row['energy']}, "
                f"loudness of {row['loudness']}, "
                f"speechiness of {row['speechiness']}, "
                f"acousticness of {row['acousticness']}, "
                f"instrumentalness of {row['instrumentalness']}, "
                f"liveness of {row['liveness']}, "
                f"valence of {row['valence']}, "
                f"tempo of {row['tempo']}",
    axis=1
)
# Saving the updated DataFrame with the textual descriptions for each song back to the CSV file
metadata_df.to_csv(metadata_file_path, index=False)
print("The 'textual_description' column has been added to the CSV file.")

1.Before Getting Started

2.Getting Started with Vector Databases and Embeddings

3.Working with Vector Databases

4.Developing a Music Recommendation System

5.Wrapping Up

Generating Embeddings for Audio Files and Metadata of Songs

Metadata embeddings with BERT embedding model