Article details

Research area
Speech enhancement



Jonas Sautter, Friedrich Faubel, Markus Buck, Gerhard Schmidt

Discriminative Training of Deep Regression Networks for Artificial Bandwidth Extension


Artificial bandwidth extension reconstructs a 16 kHz wideband signal from a given 8 kHz narrowband signal. Stateof-the-art approaches use regression deep neural networks (DNNs) for extending the spectral envelope. As a cost function during training, they use the mean squared error (MSE) between true and estimated wideband spectral envelopes. With pure MSE training, the extension for fricatives and vowels is not distinctive enough compared to the true WB data. In this work, we propose to add a discriminative term to the cost function that forces the DNN to extend the energy more distinctively for different phoneme classes. The proposed cost function improves the separation of fricatives and vowels in the DNN. It also results in a higher speech quality, which was shown in subjective listening tests.

Read/download now