Research area
Speech enhancement

Speech Communication; 11. ITG Symposium; Proceedings of


Simon Graf, Tobias Herbig, Markus Buck, Gerhard Schmidt

Improved performance measures for voice activity detection


Voice activity detection is an essential part of many speech processing algorithms. The requirements of the speech application determine the design of voice activity detection. Some applications need low-latency results whereas the accuracy of speech detection is more important for other applications. The performance is generally evaluated by Receiver Operating Characteristic (ROC) curves, which perform a static analysis averaged over speech and nonspeech segments, respectively. We adopt the ROC curves but evaluate them for specific speech classes, e.g., voiced or unvoiced speech, to describe the overall accuracy of speech detection. In addition, we present a new measure for the dynamic behavior that considers the delay and latency of speech on- and offset detection. Finally, we present a unified measure for both aspects. This measure may be used to find appropriate voice activity detection features for a given application. An automotive noise scenario is employed to demonstrate the measures as it contains stationary and non-stationary noise.

