SIL International has done linguistic work in over 2000 languages and is currently managing over 1600 language projects. The combination of these human translations and some of my machine translations can be searched on this Ethnologue guide page (machine-generated phrases are indicated with a little robot icon), and more translations will be added as they are generated/gathered. I performed this work in collaboration with my colleagues at SIL International, who have gathered even more human translations of the phrase. These embeddings then allow for the extraction of a phrase similar to the target phrase from existing documents. Multilingual Unsupervised and Supervised Embeddings (MUSE) methods are used to train cross-lingual word embeddings between each of 544 languages and English. To this end, I applied state-of-the-art AI techniques to construct something close to the phrase "wash your hands" in 544 languages and counting (my GPUs are still running). The current coronavirus (COVID-19) pandemic has made this painfully clear, and it has stressed the need for immediate, rapid translation of health-related phrases (like "wash your hands" or "keep your distance") into the long tail of languages. This reality means that there are billions of people around the world that are marginalized due to a lack of timely access to information. Not dialects, but living languages! However, much of the world's digital media is available in only a couple dozen languages, and translation platforms like Google Translate only support around 100 languages. You might not know, but there are currently 7,117 languages spoken in the world.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |