Plagiarism Detection through Vector Space Models Applied to a Digital Library

Warning

This publication doesn't include Faculty of Economics and Administration. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

ŘEHŮŘEK Radim

Year of publication 2008
Type Article in Proceedings
Conference RASLAN 2008
MU Faculty or unit

Faculty of Informatics

Citation
Web https://nlp.fi.muni.cz/raslan/2008/papers/4.pdf
Field Use of computers, robotics and its application
Keywords plagiarism; vector space; digital library
Description Plagiarism is an increasing problem in the digital world. The sheer amount of digital data calls for automation of plagirism discovery. In this paper we evaluate an Information Retrieval approach of dealing with plagiarism through Vector Spaces. This will allow us to detect similarities that are not result of naive copy\&paste. We also consider the extension of Vector Spaces where input documents are analyzed for term co-occurence, allowing us to introduce some semantics into our approach beyond mere word matching. The approach is evaluated on a real-world collection of mathematical documents as part of the DML-CZ project.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.