Discriminating Between Similar Languages Using Large Web Corpora
Authors | |
---|---|
Year of publication | 2019 |
Type | Article in Proceedings |
Conference | Proceedings of the Thirteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2019 |
MU Faculty or unit | |
Citation | |
Web | https://nlp.fi.muni.cz/raslan/2019/paper12-suchomel.pdf |
Keywords | language identification; discriminating similar languages; building web corpora |
Description | This paper presents a method for discriminating similar lan-guages based on wordlists from large web corpora. The main benefits ofthe approach are language independency, a measure of confidence of theclassification and an easy-to-maintain implementation.The method is evaluated on VarDial 2014 workshop data set. The resultaccuracy is comparable to other methods successfully performing at theworkshop.A tool implementing the method in Python can be obtained from web sitehttp://corpus.tools/. |
Related projects: |