Building Evaluation Dataset for Textual Entailment in Czech

Investor logo


This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on


Year of publication 2012
Type Article in Proceedings
Conference Sixth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2012
MU Faculty or unit

Faculty of Informatics

Field Informatics
Keywords textual entailment; evaluation data set; Czech language; paraphrasing
Description Recognizing textual entailment (RTE) is a subfield of natural language processing (NLP). Currently several RTE systems exist in which some of the subtasks are language independent but some are not. Moreover, large datasets for evaluation are prepared almost exclusively for English language. In this paper we describe methods for obtaining test dataset for RTE in Czech. We have used methods for extracting facts from texts based on corpus templates as well as syntactic parser. Moreover, we have used reading comprehension tests for children and students. The main contribution of this article is the classification of “difficulty levels” for particular RTE questions.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.