Určování autorství anonymních textů na základě automaticky nalezených charakteristických znaků


This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Title in English Determining Authorship of Anonymous Texts Based on Automatically Discovered Characteristic Features


Year of publication 2011
Type Special-purpose publication
MU Faculty or unit

Faculty of Informatics

Description Master's thesis. The work is based on the most successful methods for determining authorship of anonymous documents. We combine, optimize and revise these methods and create new techniques for three main tasks: Automatic assignment of the authorship with the given set of documents, Verification of the authorship of the document by selected author, Clustering of documents according to their authorships. Our implemented algorithms are tested on the Czech documents, but system is modular and if we remove or replace some language-dependent components, we can process documents written in any language. Everything is coded in the Python. The system contains tools for preprocessing of Czech data and for management of stored documents in the PostgreSQL database. The thesis also makes empirical observations of performance of the most popular methods for determining authorship of Czech documents. Most measurements were performed on English texts (books, newspaper articles, rarely e-mails) and until now the statistics for Czech data were missing.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.