Integration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers

Mou, Xiaolong; Seneff, Stephanie; Zue, Victor

doi:10.21437/ICSLP.2002-395

This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level ngram probabilities and phrase-level context-dependent probabilities, which is beyond the standard context-free grammar (CFG) formalism. Such a shallow parsing approach can help balance sufficient grammar coverage and tight structure constraints. The context-dependent probabilistic shallow parsing model is represented by layered FSTs, which can be integrated with speech recognition seamlessly to impose early phrase-level structural constraints consistent with natural language understanding. It is shown that in the JUPITER [1] weather information domain, the shallow parsing model achieves lower recognition word error rates, compared to a regular class ngram model with the same order. However, we find that, with a higher order top-level n-gram model, pre-composition and optimization of the FSTs are highly restricted by the computational resources available. Given the potential of such models, it may be worth pursing an incremental approximation strategy [2], which includes part of the linguistic model FST in early optimization, while introducing the complete model through dynamic composition.

Integration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers

Xiaolong Mou, Stephanie Seneff, Victor Zue