Research area
Text to speech

In proceedings of ICASSP, Kyoto, Japan


Asaf Rendel, Alexander Sorin, Ron Hoory

Towards automatic phonetic segmentation for TTS


Phonetic segmentation is an important step in the development of a concatenative TTS voice. This paper introduces a segmentation process consisting of two phases. First, forced alignment is performed using an HMM-GMM model. The resulting segmentation is then locally refined using an SVM based boundary model. Both the models are derived from multi-speaker data using a speaker adaptive training procedure. Evaluation results are obtained on the TIMIT corpus and on a proprietary single-speaker TTS corpus.

