DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

Investor logo

Warning

This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

HERMAN Ondřej SUCHOMEL Vít BAISA Vít RYCHLÝ Pavel

Year of publication 2016
Type Article in Proceedings
Conference Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
MU Faculty or unit

Faculty of Informatics

Citation
Web https://aclanthology.info/pdf/W/W16/W16-4815.pdf
Field Informatics
Keywords language discrimination;expectation maximization;language model
Description In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.