Effective use of pause information in language modelling for speech recognition

Ohta, Kengo; Tsuchiya, Masatoshi; Nakagawa, Seiichi

doi:10.21437/Interspeech.2009-126

This paper addresses mismatch between speech processing units used by a speech recognizer and sentences of corpora. A standard speech recognizer divides an input speech into speech processing units based on its power information. On the other hand, training corpora of language models are divided into sentences based on punctuations. There is inevitable mismatch between speech processing units and sentences, and both of them are not optimal for a spontaneous speech recognition task. This paper presents two sub issues to address this problem. At first, the words of the preceding units are utilized to predict the words of the succeeding units, in order to address the mismatch between speech processing units and optimal units. Secondly, we propose a method to build a language model including short pause from a corpus with no short pause to address the mismatch between speech processing units and sentences. Their combination achieved a 4.5% relative improvement over the conventional method in the meeting speech recognition task.

Effective use of pause information in language modelling for speech recognition

Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa