Article details

Research area
Speech recognition

Location
WOCCI 2014 proceedings

Date
2014

Author(s)
Sharmistha Gray, Daniel Willett, Jianhua Lu, Joel Pinto, Paul Maergner, Nathan Bodenstab

Child automatic speech recognition for US English: child interaction with living-room-electronic-devices

Synopsis:

Adult-targeted automatic speech recognition (ASR) has made significant advancements in recent years and can produce speech-to-text output with very low word-error-rate, for multiple languages, and in various types of noisy environments, e.g. car noise, living-room, outdoor-noise, etc. But when it comes to child speech, little is available at the performance level of adult targeted ASR. It requires a considerable amount of data to build an ASR for naturally spoken, spontaneous, and continuous child speech. In this study, we show that using a minimal amount of data we adapt multiple components of a state-of-the-art adult centric large vocabulary continuous speech recognition (LVCSR) system to form a child specific LVCSR system. The resulting ASR system improves the accuracy for children speaking US English to living room electronic devices (LRED), e.g. a voice-operated TV or computer. Techniques we explore in this paper include vocal tract length normalization, acoustic model adaptation, language model adaptation with child-specific content lists and grammars, as well as a neural network based approach to automatically classify child data. The combined initiative towards child-specific ASR system for the LRED domain results in relative WER improvement of 27.2% compared to adult-targeted models.

Read/download now