Article details

Research area
Speech recognition

INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA


Stefan Hahn, Paul Vozila, Maximilian Bisani

Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and lvcsr tasks


Grapheme-to-Phoneme conversion (G2P) is usually used within every state-of-the-art ASR system to generalize beyond a fixed set of words. Although the performance is typically already quite good (< 10% phoneme error rate) and pronunciations of important words are checked by a linguist, further improvements are still desirable, especially for end user customization. In this work, we present and compare five methods/tools to tackle the G2P task. Although most of the methods have already been published and/or are available as open source software, the reported experiments are done on large state-of-the-art tasks and the used software is from the actual publications. Besides an experimental comparison on text data for a range of languages (i.e. measuring the G2P accuracy only), our focus in this paper is measuring the effect of improved G2P modeling on LVCSR performance for a challenging ASR task. Additionally, the effect of using n-Best pronunciation variants instead of single best is investigated briefly.

Read/download now