ParaDiom–A Parallel Corpus of Idiomatic Texts

G Donaj, Š Antloga - International Conference on Text, Speech, and …, 2023 - Springer
International Conference on Text, Speech, and Dialogue, 2023Springer
This paper present ParaDiom–a parallel corpus with 2000 Slovene and English text
segments. The text segments are rich with manually annotated idiomatic expressions, which
poses a challenge for machine translation systems. We describe the definition of idiomatic
expressions, the sampling of the corpus sentences, the annotation scheme, and the general
characteristics of the finished corpus. The motivation for this corpus is to have a test set for
machine translation systems to evaluate their performance on figurative language. In the last …
Abstract
This paper present ParaDiom – a parallel corpus with 2000 Slovene and English text segments. The text segments are rich with manually annotated idiomatic expressions, which poses a challenge for machine translation systems. We describe the definition of idiomatic expressions, the sampling of the corpus sentences, the annotation scheme, and the general characteristics of the finished corpus. The motivation for this corpus is to have a test set for machine translation systems to evaluate their performance on figurative language. In the last part of the paper, we demonstrate an example use of the corpus in a machine translation experiment.
Springer
顯示最佳搜尋結果。 查看所有結果