AI and Endangered Languages
Endangered languages represent a significant facet of cultural diversity and heritage. With over 7,000 languages spoken worldwide, it is estimated that one language dies every two weeks, often taking with it unique worldviews and knowledge systems.
AI has emerged as a tool to help preserve and potentially revive these languages, offering innovative solutions to some of the challenges faced by linguists and communities alike.
Translation services driven by AI facilitate the translation of endangered languages into more widely spoken ones and vice versa, promoting learning and usage. Separately, platforms like Rosetta Stone collaborate with indigenous communities to develop courses for endangered languages, though these efforts are not typically AI-driven.
To be clear a database or dataset of the language in question needs to be created first, THEN translation can happen with the help of AI tools.
AI in Language Documentation
AI technologies facilitate the recording and transcription of spoken languages, even those with limited speakers. Speech recognition models convert spoken words into written text, aiding documentation and analysis.
Natural Language Processing (NLP) algorithms collect and organize linguistic data from various sources, including audio recordings, texts, and videos. For instance, the Endangered Languages Project, supported by Google, digitizes audio recordings of endangered languages.
In June 2024, Meta's AI research division announced a significant advancement in machine translation by expanding its system to include over 200 languages, many of which are under-resourced and previously unsupported by such technology.
This initiative, part of Meta AI's 'No Language Left Behind' program, aims to bridge the digital divide by enabling speakers of these languages to access online content in their native tongues.
The project involved creating a 'seed' dataset with the help of professional translators for 39 languages and developing techniques to mine web data for parallel datasets in the remaining languages.
Additionally, the team generated lists of approximately 200 'toxic' words for each language to identify and mitigate potentially harmful translations.
While this development holds promise for preserving endangered languages and promoting digital inclusivity, experts emphasize the necessity of ongoing collaboration with native-speaking communities.
Such engagement ensures that translations are accurate and culturally relevant, preventing the erosion of linguistic nuances and values.
Without this human element, there's a risk that machine translation could inadvertently contribute to the decline of the very languages it seeks to support.
Beyond language, AI aids in preserving the cultural contexts in which these languages are used. This includes recording traditional stories, songs, and rituals, ensuring they remain part of the living cultural heritage. The Living Tongues Institute for Endangered Languages utilizes AI to create multimedia resources that capture the cultural practices of indigenous communities.
Sources: