Utilizing Linguistic Resources: Theory and Practical Experience
Authors | |
---|---|
Year of publication | 2010 |
Type | Article in Proceedings |
Conference | Proceedings of Recent Advances in Slavonic Natural Language Processing 2010 |
MU Faculty or unit | |
Citation | |
Web | https://nlp.fi.muni.cz/raslan/2010/paper04.pdf |
Field | Informatics |
Keywords | linguistic resources; corpora; theory; practice |
Description | The Prague Dependency Treebank (henceforth PDT) is a large collection of texts in Czech. It contains several layers of rich annotation, ranging from morphology to deep syntax. It is unique in its size and theoretical background, especially for a language like Czech, which can be, with regard to the number of its speakers, considered a small language. In this article, we use PDT 2.0 to demonstrate that within real NLP systems, complex annotations may cut both ways. We present several issues that might pose problems when extracting data from PDT, and complex structures in general, and hint on possible solutions. |
Related projects: |