Article details

Research area
Text to speech

Location
Brisbane, Australia

Date
2015

Author(s)
Alexander Sorin, Slava Shechtman, Vincent Pollet

Coherent Modification Of Pitch And Energy For Expressive Prosody Implantation

Synopsis:

In expressive TTS and voice transformation systems, implantation of expressive prosody derived from external out-of-domain sources often leads to extreme pitch modification that compromises the naturalness of the synthesized speech. In this work we investigate and prove a hypothesis that the naturalness loss is in part attributed to a violation of a fundamental relationship between the instantaneous pitch frequency and instantaneous energy of a speech signal. We propose an enhancement for pitch modification where the instantaneous energy is modified coherently with the pitch frequency and demonstrate the potential of this method in a subjective listening evaluation. The proposed approach is complementary to and can be combined with spectrum shape transformation methods for achieving the maximal possible quality of pitch modification.

Read/download now