Commonly detected emotions include happiness, sadness, anger, fear, surprise, and disgust.
Key takeaways:
Speech emotion recognition (SER) utilizes AI to identify emotional states from voice characteristics, with applications in therapy, education, and customer service, leveraging techniques like Artificial Neural Networks (ANN).
The process of emotion recognition involves several key steps: importing necessary libraries, collecting a labeled speech dataset (such as the
RAVDESS
dataset), preprocessing audio files to extract relevant features, and mapping extracted emotions to integer labels.Data visualization plays a crucial role, as the final step includes generating count plots to visually represent the distribution of different emotions in the dataset, enhancing understanding and analysis of the recognized emotions.
Speech emotion recognition (SER) is a field of artificial intelligence (AI) that focuses on identifying a speaker’s emotional state based on their voice characteristics. It can analyze emotional states in therapy sessions, identify struggling students in education, and gauge customer satisfaction in calls.
The process of employing Artificial Neural Networks (ANN) for speech emotion recognition entails teaching a neural network to identify different emotions based on characteristics taken from speech signals. This is a general overview of how we could go about completing this task.
Here’s the step-by-step process of recognizing speech emotions in ANN:
Let’s start this task by importing the required libraries or modules.
Libraries
We’ll use matplotlib
, librosa
, pandas
, numpy
, and keras
library to recognize speech emotions in ANN.
import pandas as pdimport numpy as npimport osimport sys# librosa is a Python library for analyzing audio and music. It can be used to extract the data from the audio files; we will see it later.import librosaimport librosa.displayimport seaborn as snsimport matplotlib.pyplot as pltfrom sklearn.preprocessing import StandardScaler, OneHotEncoderfrom sklearn.metrics import confusion_matrix, classification_reportfrom sklearn.model_selection import train_test_split# to play the audio filesfrom IPython.display import Audioimport kerasfrom keras.callbacks import ReduceLROnPlateaufrom keras.models import Sequentialfrom keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalizationfrom keras.callbacks import ModelCheckpoint
Gather a dataset of speech recordings labeled with the corresponding emotions. There are several publicly available datasets for speech emotion recognition. We’ll use ravdess
dataset for this task.
Dataset
The dataset that we will be using in this task can be downloaded from
Ravdess = "Audioss"file_emotion = []file_path = []
Line 1: Ravdess = "Audioss"
specifies the directory path where the audio files are located.
Lines 2–3: file_emotion = [ ]
, file_path = [ ]
initialize lists to store emotions and file paths.
Preprocess the speech recordings to extract features that are relevant for emotion recognition.
if os.path.isdir(actor_path):for file_name in os.listdir(actor_path):file_path_full = os.path.join(actor_path, file_name)if not file_name.endswith(".wav") or file_name.startswith("._"):continue
Line 1: The script iterates through each item (actor_dir
) in ravdess_directory_list
.
Line 2: For each actor_dir
, list all files (actor_path
) in the corresponding subdirectory using os.listdir(actor_path)
.
Line 3: Construct the full file path.
Line 5: Skip processing files that are not .wav
files or hidden/system files.
Extract the emotions from file names by splitting the file names.
parts = file_name.split('.')[0].split('-')if len(parts) >= 3:emotion = int(parts[2])file_emotion.append(emotion)file_path.append(file_path_full)else:print("Error: Unexpected filename format for file:", file_name)
Line 1: For each file, split the file name by periods (.
) to remove the file extension and then split the resulting string by hyphens (-
).
Lines 2-5: Check if there are at least three parts in the split result. If so, extract the emotion label (emotion
) from the third part (index 2) and append it to the file_emotion
list.
Lines 6-7: If the file name does not follow the expected format, print an error message.
Create a DataFrame from the emotions list and file path.
emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])path_df = pd.DataFrame(file_path, columns=['Path'])Ravdess_df = pd.concat([emotion_df, path_df], axis=1)
Line 1: Rename the column to 'Emotions'
using emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
Line 2: Rename the column 'Path'
using path_df = pd.DataFrame(file_path, columns=['Path'])
.
Line 3: After processing all files, create a DataFrame (Ravdess_df
) by concatenating emotion_df
and path_df
along the columns axis using pd.concat([emotion_df, path_df], axis=1)
.
Now, map integer emotions to actual emotions.
emotion_map = {1: 'neutral', 2: 'calm', 3: 'happy', 4: 'sad', 5: 'angry', 6: 'fear', 7: 'disgust', 8: 'surprise'}Ravdess_df['Emotions'] = Ravdess_df['Emotions'].map(emotion_map)
Lines 1–2: Map integer emotion labels to actual emotions using a predefined mapping dictionary (emotion_map
) and Ravdess_df['Emotions'].map(emotion_map)
.
Finally, it prints the first few rows of the DataFrame.
print(Ravdess_df.head())
Line 1: Print the first 5 rows of Ravdess_df
DataFrame.
Generate a count plot using seaborn to visualize the distribution of different emotions present in the DataFrame Ravdess_df
.
plt.title('Count of Emotions', size=16)sns.countplot(Ravdess_df.Emotions) # Use the correct DataFrame variableplt.ylabel('Count', size=12)plt.xlabel('Emotions', size=12)sns.despine(top=True, right=True, left=False, bottom=False)plt.show()
Line 1: Set the title of the plot.
Line 2: Count plot for visualizing the data in the Emotions
column of a DataFrame Ravdess_df
.
Line 3: Set the label "Count"
for the y-axis of the plot and adjust its font size to 12
.
Line 4: Set the label "Emotions"
for the x-axis of the plot and adjust its font size to 12
.
Line 5: Customize the appearance of the plot’s lines on the axes. The arguments specify which lines on the axis to remove (top=True
, right=True
) and which to keep (left=False
, bottom=False
). In this case, it removes the top and right lines of the axis, potentially creating a cleaner visual presentation.
Line 6: Show the plot.
Click the “Run” button, then click the link provided below it to open the Jupyter Notebook.
Please note that the notebook cells have been pre-configured to display the outputs for your convenience and to facilitate an understanding of the concepts covered. You are encouraged to actively engage with the material by changing the variable values.
Haven’t found what you were looking for? Contact Us
Free Resources