Commonly detected emotions include happiness, sadness, anger, fear, surprise, and disgust.
Speech emotion recognition using ANN
Key takeaways:
Speech emotion recognition (SER) utilizes AI to identify emotional states from voice characteristics, with applications in therapy, education, and customer service, leveraging techniques like Artificial Neural Networks (ANN).
The process of emotion recognition involves several key steps: importing necessary libraries, collecting a labeled speech dataset (such as the
RAVDESSdataset), preprocessing audio files to extract relevant features, and mapping extracted emotions to integer labels.Data visualization plays a crucial role, as the final step includes generating count plots to visually represent the distribution of different emotions in the dataset, enhancing understanding and analysis of the recognized emotions.
Speech emotion recognition (SER) is a field of artificial intelligence (AI) that focuses on identifying a speaker’s emotional state based on their voice characteristics. It can analyze emotional states in therapy sessions, identify struggling students in education, and gauge customer satisfaction in calls.
The process of employing Artificial Neural Networks (ANN) for speech emotion recognition entails teaching a neural network to identify different emotions based on characteristics taken from speech signals. This is a general overview of how we could go about completing this task.
Step-by-step guide
Here’s the step-by-step process of recognizing speech emotions in ANN:
Importing libraries
Let’s start this task by importing the required libraries or modules.
Libraries
We’ll use matplotlib, librosa, pandas, numpy, and keras library to recognize speech emotions in ANN.
import pandas as pdimport numpy as npimport osimport sys# librosa is a Python library for analyzing audio and music. It can be used to extract the data from the audio files; we will see it later.import librosaimport librosa.displayimport seaborn as snsimport matplotlib.pyplot as pltfrom sklearn.preprocessing import StandardScaler, OneHotEncoderfrom sklearn.metrics import confusion_matrix, classification_reportfrom sklearn.model_selection import train_test_split# to play the audio filesfrom IPython.display import Audioimport kerasfrom keras.callbacks import ReduceLROnPlateaufrom keras.models import Sequentialfrom keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalizationfrom keras.callbacks import ModelCheckpoint
Dataset collection
Gather a dataset of speech recordings labeled with the corresponding emotions. There are several publicly available datasets for speech emotion recognition. We’ll use ravdess dataset for this task.
Dataset
The dataset that we will be using in this task can be downloaded from
Ravdess = "Audioss"file_emotion = []file_path = []
Line 1:
Ravdess = "Audioss"specifies the directory path where the audio files are located.Lines 2–3:
file_emotion = [ ],file_path = [ ]initialize lists to store emotions and file paths.
Preprocess files in the directory
Preprocess the speech recordings to extract features that are relevant for emotion recognition.
if os.path.isdir(actor_path):for file_name in os.listdir(actor_path):file_path_full = os.path.join(actor_path, file_name)if not file_name.endswith(".wav") or file_name.startswith("._"):continue
Line 1: The script iterates through each item (
actor_dir) inravdess_directory_list.Line 2: For each
actor_dir, list all files (actor_path) in the corresponding subdirectory usingos.listdir(actor_path).Line 3: Construct the full file path.
Line 5: Skip processing files that are not
.wavfiles or hidden/system files.
Extract emotions from file names
Extract the emotions from file names by splitting the file names.
parts = file_name.split('.')[0].split('-')if len(parts) >= 3:emotion = int(parts[2])file_emotion.append(emotion)file_path.append(file_path_full)else:print("Error: Unexpected filename format for file:", file_name)
Line 1: For each file, split the file name by periods (
.) to remove the file extension and then split the resulting string by hyphens (-).Lines 2-5: Check if there are at least three parts in the split result. If so, extract the emotion label (
emotion) from the third part (index 2) and append it to thefile_emotionlist.Lines 6-7: If the file name does not follow the expected format, print an error message.
Create DataFrame
Create a DataFrame from the emotions list and file path.
emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])path_df = pd.DataFrame(file_path, columns=['Path'])Ravdess_df = pd.concat([emotion_df, path_df], axis=1)
Line 1: Rename the column to
'Emotions'usingemotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])Line 2: Rename the column
'Path'usingpath_df = pd.DataFrame(file_path, columns=['Path']).Line 3: After processing all files, create a DataFrame (
Ravdess_df) by concatenatingemotion_dfandpath_dfalong the columns axis usingpd.concat([emotion_df, path_df], axis=1).
Map emotions
Now, map integer emotions to actual emotions.
emotion_map = {1: 'neutral', 2: 'calm', 3: 'happy', 4: 'sad', 5: 'angry', 6: 'fear', 7: 'disgust', 8: 'surprise'}Ravdess_df['Emotions'] = Ravdess_df['Emotions'].map(emotion_map)
Lines 1–2: Map integer emotion labels to actual emotions using a predefined mapping dictionary (
emotion_map) andRavdess_df['Emotions'].map(emotion_map).
Display DataFrame
Finally, it prints the first few rows of the DataFrame.
print(Ravdess_df.head())
Line 1: Print the first 5 rows of
Ravdess_dfDataFrame.
Visualize and plot data
Generate a count plot using seaborn to visualize the distribution of different emotions present in the DataFrame Ravdess_df.
plt.title('Count of Emotions', size=16)sns.countplot(Ravdess_df.Emotions) # Use the correct DataFrame variableplt.ylabel('Count', size=12)plt.xlabel('Emotions', size=12)sns.despine(top=True, right=True, left=False, bottom=False)plt.show()
Line 1: Set the title of the plot.
Line 2: Count plot for visualizing the data in the
Emotionscolumn of a DataFrameRavdess_df.Line 3: Set the label
"Count"for the y-axis of the plot and adjust its font size to12.Line 4: Set the label
"Emotions"for the x-axis of the plot and adjust its font size to12.Line 5: Customize the appearance of the plot’s lines on the axes. The arguments specify which lines on the axis to remove (
top=True,right=True) and which to keep (left=False,bottom=False). In this case, it removes the top and right lines of the axis, potentially creating a cleaner visual presentation.Line 6: Show the plot.
Try it yourself
Click the “Run” button, then click the link provided below it to open the Jupyter Notebook.
Please note that the notebook cells have been pre-configured to display the outputs for your convenience and to facilitate an understanding of the concepts covered. You are encouraged to actively engage with the material by changing the variable values.
Frequently asked questions
Haven’t found what you were looking for? Contact Us
What are the common emotions detected in speech emotion recognition?
What types of ANN architectures are used for speech emotion recognition?
Free Resources