Detection of voiced speech and estimation of the pitch frequency are important tasks for many speech processing algorithms. Pitch information can be used, e.g., to reconstruct voiced speech corrupted by noise.
In automotive environments, driving noise especially affects voiced speech portions in the lower frequencies. Pitch estimation is therefore important, e.g., for in-car-communication systems. Such systems amplify the driver’s voice and allow for convenient conversations with backseat passengers. Low latency is required for this application, which requires the use of short window lengths and short frame shifts between consecutive frames. Conventional pitch estimation techniques, however, rely on long windows that exceed the pitch period of human speech. In particular, male speakers’ low pitch frequencies are difficult to resolve.
In this publication, we introduce a technique that approaches pitch estimation from a different perspective. The pitch information is extracted based on phase differences between multiple low-resolution spectra instead of a single long window. The technique benefits from the high temporal resolution provided by the short frame shift and is capable to deal with the low spectral resolution caused by short window lengths. Using the new approach, even very low pitch frequencies can be estimated very efficiently.Read/download now