ISCA Archive Interspeech 2020
ISCA Archive Interspeech 2020

Language Modeling for Speech Analytics in Under-Resourced Languages

Simone Wills, Pieter Uys, Charl van Heerden, Etienne Barnard

Different language modeling approaches are evaluated on two under-resourced, agglutinative, South African languages; Sesotho and isiZulu. The two languages present different challenges to language modeling based on their respective orthographies; isiZulu is conjunctively written whereas Sotho is disjunctively written. Two subword modeling approaches are evaluated and shown to be useful to reduce the OOV rate for isiZulu, and for Sesotho, a multi-word approach is evaluated for improving ASR accuracy, with limited success. RNNs are also evaluated and shown to slightly improve ASR accuracy, despite relatively small text corpora.

  翻译: