Speech emotion recognition using ANN

Key takeaways:
Speech emotion recognition (SER) utilizes AI to identify emotional states from voice characteristics, with applications in therapy, education, and customer service, leveraging techniques like Artificial Neural Networks (ANN).
The process of emotion recognition involves several key steps: importing necessary libraries, collecting a labeled speech dataset (such as the RAVDESS dataset), preprocessing audio files to extract relevant features, and mapping extracted emotions to integer labels.
Data visualization plays a crucial role, as the final step includes generating count plots to visually represent the distribution of different emotions in the dataset, enhancing understanding and analysis of the recognized emotions.

Speech emotion recognition (SER) is a field of artificial intelligence (AI) that focuses on identifying a speaker’s emotional state based on their voice characteristics. It can analyze emotional states in therapy sessions, identify struggling students in education, and gauge customer satisfaction in calls.

The process of employing Artificial Neural Networks (ANN) for speech emotion recognition entails teaching a neural network to identify different emotions based on characteristics taken from speech signals. This is a general overview of how we could go about completing this task.

Step-by-step guide

Here’s the step-by-step process of recognizing speech emotions in ANN:

Importing libraries

Let’s start this task by importing the required libraries or modules.

Libraries

We’ll use matplotlib, librosa, pandas, numpy, and keras library to recognize speech emotions in ANN.

import pandas as pd
import numpy as np
import os
import sys
# librosa is a Python library for analyzing audio and music. It can be used to extract the data from the audio files; we will see it later.
import librosa
import librosa.display
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
# to play the audio files
from IPython.display import Audio
import keras
from keras.callbacks import ReduceLROnPlateau
from keras.models import Sequential
from keras.layers import Dense, Conv1D, MaxPooling1D, Flatten, Dropout, BatchNormalization
from keras.callbacks import ModelCheckpoint

Line 1: Set the title of the plot.
Line 2: Count plot for visualizing the data in the Emotions column of a DataFrame Ravdess_df.
Line 3: Set the label "Count" for the y-axis of the plot and adjust its font size to 12.
Line 4: Set the label "Emotions" for the x-axis of the plot and adjust its font size to 12.
Line 5: Customize the appearance of the plot’s lines on the axes. The arguments specify which lines on the axis to remove (top=True, right=True) and which to keep (left=False, bottom=False). In this case, it removes the top and right lines of the axis, potentially creating a cleaner visual presentation.
Line 6: Show the plot.