تضمينات البيانات الوصفية باستخدام نموذج تضمين BERT

لاستخدام BERT لتوليد تضمين البيانات الوصفية، نحتاج إلى تحويل معلومات البيانات الوصفية الجدولية (السمات والقيم) لأغنية إلى سلسلة، والتي نسميهاtextual_description من الأغنية. على سبيل المثال:

"أغنية Infinity Edge لها قابلية رقص تبلغ 0.528، وطاقة تبلغ 0.847، ومستوى صوت -4.741، ووضوح 0.0307، وصوتية تبلغ 0.00674، وآلات موسيقية تبلغ 0.814، وحيوية تبلغ 0.12، وتكافؤ 0.389، وإيقاع يبلغ 143.997."

للقيام بذلك، نقرأ ملف CSV الذي يحتوي على بيانات وصفية للأغاني، وننشئ نصًا وصفيًا لكل أغنية باستخدام سماتها وقيمها. نضيف هذا النص إلى إطار البيانات، ثم نحفظ إطار البيانات المُحدّث في ملف CSV.

Press + to interact

import pandas as pd
metadata_file_path="/content/drive/MyDrive/vector-databases-course/music-recommendation-system/dataset/reduced_80_fer2013_music_dataset_with_youtube_URLS.csv"
# Loading metadata into DataFrame
metadata_df = pd.read_csv(metadata_file_path)
# Extracting relevant numeric attribute and creating textual description using them
metadata_df['textual_description'] = metadata_df.apply(
    lambda row: f"The song {row['song_name']} has a danceability of {row['danceability']}, "
                f"energy of {row['energy']}, "
                f"loudness of {row['loudness']}, "
                f"speechiness of {row['speechiness']}, "
                f"acousticness of {row['acousticness']}, "
                f"instrumentalness of {row['instrumentalness']}, "
                f"liveness of {row['liveness']}, "
                f"valence of {row['valence']}, "
                f"tempo of {row['tempo']}",
    axis=1
)
# Saving the updated DataFrame with the textual descriptions for each song back to the CSV file
metadata_df.to_csv(metadata_file_path, index=False)
print("The 'textual_description' column has been added to the CSV file.")

Press + to interact

# Define special tokens for numeric attributes
SPECIAL_TOKENS = {
    '[DANCEABILITY]': 'danceability',
    '[SPEECHINESS]': 'speechiness',
    '[ENERGY]': 'energy',
    '[LOUDNESS]': 'loudness',
    '[ACOUSTICNESS]': 'acousticness',
    '[INSTRUMENTALNESS]': 'instrumentalness',
    '[LIVENESS]': 'liveness',
    '[VALENCE]': 'valence',
    '[TEMPO]': 'tempo'
}
# Tokenize using custom special tokens
def tokenize_with_special_tokens(text):
    for token, attribute in SPECIAL_TOKENS.items():
        text = text.replace(attribute, token)
    return text
# Apply tokenization to create input for BERT
metadata_df['textual_description_with_Special_Tokens'] = metadata_df['textual_description'].apply(tokenize_with_special_tokens)

قبل البدء

البدء باستخدام قواعد بيانات المتجهات والتضمينات

العمل مع قواعد بيانات المتجهات

تطوير نظام توصية الموسيقى

اختتام

إنشاء تضمينات لملفات الصوت والبيانات الوصفية للأغاني

تضمينات البيانات الوصفية باستخدام نموذج تضمين BERT