Sciweavers

CLEF
2010
Springer

Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection - Lab Report for PAN at CLEF 2010

13 years 5 months ago
Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection - Lab Report for PAN at CLEF 2010
This report explains our plagiarism detection method using fuzzy semantic-based string similarity approach. The algorithm was developed through four main stages. First is pre-processing which includes tokenisation, stemming and stop words removing. Second is retrieving a list of candidate documents for each suspicious document using shingling and Jaccard coefficient. Suspicious documents are then compared sentence-wise with the associated candidate documents. This stage entails the computation of fuzzy degree of similarity that ranges between two edges: 0 for completely different sentences and 1 for exactly identical sentences. Two sentences are marked as similar (i.e. plagiarised) if they gain a fuzzy similarity score above a certain threshold. The last step is post-processing whereby consecutive sentences are joined to form single paragraphs/sections. Our performance measures on PAN'09 training corpus for external plagiarism detection task (recall=0.3097, precision=0.5424, granu...
Salha Alzahrani, Naomie Salim
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2010
Where CLEF
Authors Salha Alzahrani, Naomie Salim
Comments (0)