A general-purpose 32 ms prosodic vector for hidden Markov modeling

Laskowski, Kornel; Heldner, Mattias; Edlund, Jens

doi:10.21437/Interspeech.2009-247

Prosody plays a central role in conversation, making it important for speech technologies to model. Unfortunately, the application of standard modeling techniques to the acoustics of prosody has been hindered by difficulties in modeling intonation. In this work, we explore the suitability of the recently introduced fundamental frequency variation (FFV) spectrum as a candidate general representation of tone. Experiments on 4 tasks demonstrate that FFV features are complimentary to other acoustic measures of prosody and that hidden Markov models offer a suitable modeling paradigm. Proposed improvements yield a 35% relative decrease in error on unseen data and simultaneously reduce time complexity by a factor of five. The resulting representation is sufficiently mature for general deployment in a broad range of automatic speech processing applications.

A general-purpose 32 ms prosodic vector for hidden Markov modeling

Kornel Laskowski, Mattias Heldner, Jens Edlund