Article details

Research area
Speech recognition

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on


Raymond Brueckner, Björn Schuller

Hierarchical neural networks and enhanced class posteriors for social signal classification


With the impressive advances of deep learning in recent yearsthe interest in neural networks has resurged in the fields ofautomatic speech recognition and emotion recognition. In this paper we apply neural networks to address speaker-independentdetection and classification of laughter and fillervocalizations in speech. We first explore modeling class posteriorswith standard neural networks and deep stacked autoencoders.Then, we adopt a hierarchical neural architectureto compute enhanced class posteriors and demonstratethat this approach introduces significant and consistent improvementson the Social Signals Sub-Challenge of the Interspeech2013 Computational Paralinguistics Challenge (Com-ParE). On this task we achieve a value of 92.4% of the unweightedaverage area-under-the-curve, which is the officialcompetition measure, on the test set. This constitutes an improvementof 9.1% over the baseline and is the best resultobtained so far on this task.

Read/download now