Frequency of Low-Frequency Words in Text Corpora

Investor logo
Investor logo

Warning

This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

RYCHLÝ Pavel

Year of publication 2010
Type Article in Proceedings
Conference Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2010
MU Faculty or unit

Faculty of Informatics

Citation
Web https://nlp.fi.muni.cz/raslan/2010/paper15.pdf
Field Linguistics
Keywords Computational linguistics Language model; Low-frequency; Text analysis; Text corpora
Description Low-frequency words, esp. words occurring only once in a text corpus, are very popular in text analysis. Also many lexicographers draw attention to such words. This paper lists a detailed statistical analysis of low-frequency words. The results provides important information for many practical applications, including lexicography and language modeling.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.