Article details

Research area
Speech recognition

Location
INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA

Date
2012

Author(s)
Stefan Hahn, Paul Vozila, Maximilian Bisani

Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and lvcsr tasks

Synopsis:

Grapheme-to-Phoneme conversion (G2P) is usually used within every state-of-the-art ASR system to generalize beyond a fixed set of words. Although the performance is typically already quite good (< 10% phoneme error rate) and pronunciations of important words are checked by a linguist, further improvements are still desirable, especially for end user customization. In this work, we present and compare five methods/tools to tackle the G2P task. Although most of the methods have already been published and/or are available as open source software, the reported experiments are done on large state-of-the-art tasks and the used software is from the actual publications. Besides an experimental comparison on text data for a range of languages (i.e. measuring the G2P accuracy only), our focus in this paper is measuring the effect of improved G2P modeling on LVCSR performance for a challenging ASR task. Additionally, the effect of using n-Best pronunciation variants instead of single best is investigated briefly.

Read/download now