Article details

Research area

Evolving Systems September 2011, Volume 2, Issue 3,


Tobias Herbig, Franz Gerl, Wolfgang Minker, Reinhold Haeb-Umbach

Adaptive systems for unsupervised speaker tracking and speech recognition


Speech recognition offers an intuitive and convenient interface to control technical devices. Improvements achieved through ongoing research activities enable the user to handle increasingly complex tasks via speech. For special applications, e.g. dictation, highly sophisticated techniques have been developed to yield high recognition accuracy. Many use cases, however, are characterized by changing conditions such as different speakers or time-variant environments. A manifold of approaches has been published to handle the problem of changes in the acoustic environment or speaker specific voice characteristics by adapting the statistical models of a speech recognizer and speaker tracking. Combining speaker adaptation and speaker tracking may be advantageous, because it allows a system to adapt to more than one user at the same time. The performance of speech controlled systems may be continuously improved over time. In this article we review some techniques and systems for unsupervised speaker tracking which may be combined with speech recognition. We discuss a unified view on speaker identification and speech recognition embedded in a self-learning system. The latter adapts individually to its main users without requiring additional interventions of the user such as an enrollment. Robustness is continuously improved by progressive speaker adaptation. We analyze our evaluation results for a realistic in-car application to validate the evolution of the system in terms of speech recognition accuracy and identification rate.

Read/download now