Plagiarism Detection through Vector Space Models Applied to a Digital Library
Authors | |
---|---|
Year of publication | 2008 |
Type | Article in Proceedings |
Conference | RASLAN 2008 |
MU Faculty or unit | |
Citation | |
Web | https://nlp.fi.muni.cz/raslan/2008/papers/4.pdf |
Field | Use of computers, robotics and its application |
Keywords | plagiarism; vector space; digital library |
Description | Plagiarism is an increasing problem in the digital world. The sheer amount of digital data calls for automation of plagirism discovery. In this paper we evaluate an Information Retrieval approach of dealing with plagiarism through Vector Spaces. This will allow us to detect similarities that are not result of naive copy\&paste. We also consider the extension of Vector Spaces where input documents are analyzed for term co-occurence, allowing us to introduce some semantics into our approach beyond mere word matching. The approach is evaluated on a real-world collection of mathematical documents as part of the DML-CZ project. |
Related projects: |