Article details

Research area
Speech enhancement

IEEE Transactions on Audio, Speech and Language Processing, Vol. 24, April 2016


Pablo Peso Parada, Dushyant Sharma, Jose Lainez, Daniel Barreda, Toon van Waterschoot, Patrick Naylor

A single-channel non-intrusive C50 estimator correlated with speech recognition performance


Abstract—Several intrusive measures of reverberation can be
computed from measured and simulated room impulse responses,
over the full frequency band or for each individual mel-frequency
subband. It is initially shown that full-band clarity index C50 is
the most correlated measure on average with reverberant speech
recognition performance. This corroborates previous findings but
now for the dataset to be used in this study. We extend the previous
findings to show that C50 also exhibits the highest mutual
information on average. Motivated by these extended findings,
a non-intrusive room acoustic (NIRA) estimation method is
proposed to estimate C50 from only the reverberant speech
signal. The NIRA method is a data-driven approach based on
computing a number of features from the speech signal and it
employs these features to train a model used to perform the
estimation. The choice of features and learning techniques are
explored in this work using an evaluation set which comprises
approximately 100000 different reverberant signals (around 93
hours of speech) including reverberation from measured and
simulated room impulse responses. The feature importance of
each feature with respect to the estimation of the target C50 is
analysed following two different approaches. In both cases the
newly chosen set of features shows high importance for the target.
The best C50 estimator provides a root mean square deviation
around 3 dB on average for all reverberant test environments.
Index Terms—Room acoustic parameter estimation, reverberant
speech recognition, reverberation

Read/download now