Speech emotion recognition using ANN

Key takeaways:

  • Speech emotion recognition (SER) utilizes AI to identify emotional states from voice characteristics, with applications in therapy, education, and customer service, leveraging techniques like Artificial Neural Networks (ANN).

  • The process of emotion recognition involves several key steps: importing necessary libraries, collecting a labeled speech dataset (such as the RAVDESS dataset), preprocessing audio files to extract relevant features, and mapping extracted emotions to integer labels.

  • Data visualization plays a crucial role, as the final step includes generating count plots to visually represent the distribution of different emotions in the dataset, enhancing understanding and analysis of the recognized emotions.

Speech emotion recognition (SER) is a field of artificial intelligence (AI) that focuses on identifying a speaker’s emotional state based on their voice characteristics. It can analyze emotional states in therapy sessions, identify struggling students in education, and gauge customer satisfaction in calls.

The process of employing Artificial Neural Networks (ANN) for speech emotion recognition entails teaching a neural network to identify different emotions based on characteristics taken from speech signals. This is a general overview of how we could go about completing this task.

Step-by-step guide

Here’s the step-by-step process of recognizing speech emotions in ANN:

Importing libraries

Let’s start this task by importing the required libraries or modules.

Libraries

We’ll use matplotlib, librosa, pandas, numpy, and keras library to recognize speech emotions in ANN.

import pandas as pd
import numpy as np
import os
import sys
# librosa is a Python library for analyzing audio and music. It can be used to extract the data from the audio files; we will see it later.
import librosa
import librosa.display
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
# to play the audio files
from IPython.display import Audio
import keras
from keras.callbacks import ReduceLROnPlateau
from keras.models import Sequential
from keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalization
from keras.callbacks import ModelCheckpoint

Dataset collection

Gather a dataset of speech recordings labeled with the corresponding emotions. There are several publicly available datasets for speech emotion recognition. We’ll use ravdess dataset for this task.

Dataset

 The dataset that we will be using in this task can be downloaded from herehttps://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio.

Ravdess = "Audioss"
file_emotion = []
file_path = []
  • Line 1: Ravdess = "Audioss" specifies the directory path where the audio files are located.

  • Lines 2–3: file_emotion = [ ], file_path = [ ] initialize lists to store emotions and file paths.

Preprocess files in the directory

Preprocess the speech recordings to extract features that are relevant for emotion recognition.

if os.path.isdir(actor_path):
for file_name in os.listdir(actor_path):
file_path_full = os.path.join(actor_path, file_name)
if not file_name.endswith(".wav") or file_name.startswith("._"):
continue
  • Line 1: The script iterates through each item (actor_dir) in ravdess_directory_list.

  • Line 2: For each actor_dir, list all files (actor_path) in the corresponding subdirectory using os.listdir(actor_path).

  • Line 3: Construct the full file path.

  • Line 5: Skip processing files that are not .wav files or hidden/system files.

Extract emotions from file names

Extract the emotions from file names by splitting the file names.

parts = file_name.split('.')[0].split('-')
if len(parts) >= 3:
emotion = int(parts[2])
file_emotion.append(emotion)
file_path.append(file_path_full)
else:
print("Error: Unexpected filename format for file:", file_name)
  • Line 1: For each file, split the file name by periods (.) to remove the file extension and then split the resulting string by hyphens (-).

  • Lines 2-5: Check if there are at least three parts in the split result. If so, extract the emotion label (emotion) from the third part (index 2) and append it to the file_emotion list.

  • Lines 6-7: If the file name does not follow the expected format, print an error message.

Create DataFrame

Create a DataFrame from the emotions list and file path.

emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])
path_df = pd.DataFrame(file_path, columns=['Path'])
Ravdess_df = pd.concat([emotion_df, path_df], axis=1)
  • Line 1: Rename the column to 'Emotions' using emotion_df = pd.DataFrame(file_emotion, columns=['Emotions'])

  • Line 2: Rename the column 'Path' using path_df = pd.DataFrame(file_path, columns=['Path']).

  • Line 3: After processing all files, create a DataFrame (Ravdess_df) by concatenating emotion_df and path_df along the columns axis using pd.concat([emotion_df, path_df], axis=1).

Map emotions

Now, map integer emotions to actual emotions.

emotion_map = {1: 'neutral', 2: 'calm', 3: 'happy', 4: 'sad', 5: 'angry', 6: 'fear', 7: 'disgust', 8: 'surprise'}
Ravdess_df['Emotions'] = Ravdess_df['Emotions'].map(emotion_map)
  • Lines 1–2: Map integer emotion labels to actual emotions using a predefined mapping dictionary (emotion_map) and Ravdess_df['Emotions'].map(emotion_map).

Display DataFrame

Finally, it prints the first few rows of the DataFrame.

print(Ravdess_df.head())
  • Line 1: Print the first 5 rows of Ravdess_df DataFrame.

Visualize and plot data

Generate a count plot using seaborn to visualize the distribution of different emotions present in the DataFrame Ravdess_df.

plt.title('Count of Emotions', size=16)
sns.countplot(Ravdess_df.Emotions) # Use the correct DataFrame variable
plt.ylabel('Count', size=12)
plt.xlabel('Emotions', size=12)
sns.despine(top=True, right=True, left=False, bottom=False)
plt.show()
  • Line 1: Set the title of the plot.

  • Line 2: Count plot for visualizing the data in the Emotions column of a DataFrame Ravdess_df.

  • Line 3: Set the label "Count" for the y-axis of the plot and adjust its font size to 12

  • Line 4: Set the label "Emotions" for the x-axis of the plot and adjust its font size to 12

  • Line 5: Customize the appearance of the plot’s lines on the axes. The arguments specify which lines on the axis to remove (top=Trueright=True) and which to keep (left=Falsebottom=False). In this case, it removes the top and right lines of the axis, potentially creating a cleaner visual presentation.

  • Line 6: Show the plot.

Try it yourself

Click the “Run” button, then click the link provided below it to open the Jupyter Notebook.

Please note that the notebook cells have been pre-configured to display the outputs
for your convenience and to facilitate an understanding of the concepts covered. 
You are encouraged to actively engage with the material by changing the 
variable values. 
Speech emotions recognition

Frequently asked questions

Haven’t found what you were looking for? Contact Us


What are the common emotions detected in speech emotion recognition?

Commonly detected emotions include happiness, sadness, anger, fear, surprise, and disgust.


What types of ANN architectures are used for speech emotion recognition?

Common architectures include feedforward neural networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs).


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved