Sciweavers

Share
CLEF
2010
Springer

A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010

8 years 8 months ago
A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010
In this paper we present an approach to detect external plagiarism based on textual similarity. This is an efficient and precise method that can be applied over large sets of documents. The system that we have developed contains a first phase of document selection that uses a variant of tf-idf applied over the terms that appear in the two documents of the pair being compared. After this is done, we apply a more complex and accurate function based on character n-grams over the subset of documents resulting from the first step in order to extract the plagiarized passages, or matches. Once all matches for a given document are extracted, we perform a greedy match merging operation to allow in-between text in order to be compatible with certain levels of plagiarism obfuscation. In our participation in the 2nd International Competition on Plagiarism Detection, we achieved an overall score of 0.2222, ranking 11th out of 18 participants.
Daniel Micol, Óscar Ferrández, Ferna
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2010
Where CLEF
Authors Daniel Micol, Óscar Ferrández, Fernando Llopis, Rafael Muñoz
Comments (0)
books