Article details

Research area
Speech recognition

INTERSPEECH 2012 13th Annual Conference of the International Speech Communication Association


Hong-Kwang Jeff Kuo, Ebru Arısoy, Ahmad Emami, Paul Vozila

Large scale hierarchical neural network language models


Feed-forward neural network language models (NNLMs) are known to improve both perplexity and word error rate performance for speech recognition compared with conventional ngram language models. We present experimental results showing how much the WER can be improved by increasing the scale of the NNLM, in terms of model size and training data. However, training time can become very long. We implemented a hierarchical NNLM approximation to speed up the training, through splitting up events and parallelizing training as well as reducing the output vocabulary size of each sub-network. The training time was reduced by about 20 times, e.g. from 50 days to 2 days, with no degradation in WER. Using English Broadcast News data (350M words), we obtained significant improvements over the baseline n-gram language model, competitive with recently published recurrent neural network language model (RNNLM) results.

Read/download now