Experimental Setup: SEGAN

Discover the process of implementing data loaders, training functions, and loggers tailored for speech enhancement datasets with noise augmentation.

We'll cover the following

Data
Training
Initializations
Training loop
Logging audio files and losses

In this lesson, we are going to implement our data loader, training function, and loggers.

Data

We are going to use the speech enhancement dataset described in the paper “Noisy Speech Database for Training Speech Enhancement Algorithms and TTS models.” As described in the SEGAN paper, this dataset uses clean speech data from the Voice Bank corpus, using 28 speakers for the training set and two speakers for the test set. The noise dataset comes from the DEMAND database. The noisy training set is built by adding 10 types of noise (two artificial and eight from the DEMAND database) at four signal-to-noise ratios (15, 10, 5, and 0 dB) to the clean speech.

The iteration over the minibatches can be divided into the three steps as follows:

We create a list with all wave files in the clean and noisy data paths by using glob:

Get hands-on with 1400+ tech skills courses.

Getting Started

Deep Learning Basics and Environment Test

Introduction to Generative Models

Implementing Our First GAN

Evaluating Our First GAN

Improving Our First GAN

Synthesizing and Manipulating Images with GANs

Progressive Growing of GANs

Generation of Discrete Sequences Using GANs

Text-to-Image Synthesis with GANs

Speech Enhancement with GANs

TequilaGAN—Identifying GAN Samples

What’s Next in GANs

Conclusion

Appendix

Experimental Setup: SEGAN

Data