Corpus Factory

Varování

Publikace nespadá pod Ekonomicko-správní fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

KILGARRIFF Adam REDDY Siva POMIKÁLEK Jan

Rok publikování 2009
Druh Článek ve sborníku
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www http://www.kilgarriff.co.uk/Publications/2009-KilgReddyPomikalek-asialex-CorpFactory.doc
Popis State-of the art lexicography requires corpora, but for many languages there are no large, general-language corpora available. Until recently, all but the richest publishing houses could do little but shake their heads in dismay as corpus-building was long, slow and expensive. But with the advent of the Web it can be highly automated and thereby fast and inexpensive. We have developed a ‘corpus factory’ where we build lexicographic corpora. In this paper we describe the method we use, and how it has worked, and how various problems were solved, for five languages: Dutch, Hindi, Telugu, Thai and Vietnamese. The corpora we have developed are available for use in the Sketch Engine corpus query tool.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.