Experimental Setup: SEGAN

Discover the process of implementing data loaders, training functions, and loggers tailored for speech enhancement datasets with noise augmentation.

In this lesson, we are going to implement our data loader, training function, and loggers.


We are going to use the speech enhancement dataset described in the paper “Noisy Speech Database for Training Speech Enhancement Algorithms and TTS models.” As described in the SEGAN paper, this dataset uses clean speech data from the Voice Bank corpus, using 28 speakers for the training set and two speakers for the test set. The noise dataset comes from the DEMAND database. The noisy training set is built by adding 10 types of noise (two artificial and eight from the DEMAND database) at four signal-to-noise ratios (15, 10, 5, and 0 dB) to the clean speech.

The iteration over the minibatches can be divided into the three steps as follows:

  1. We create a list with all wave files in the clean and noisy data paths by using glob:

Get hands-on with 1200+ tech skills courses.