Sciweavers

CLEF
2011
Springer

Using Clustering to Identify Outlier Chunks of Text - Notebook for PAN at CLEF 2011

12 years 4 months ago
Using Clustering to Identify Outlier Chunks of Text - Notebook for PAN at CLEF 2011
Intrinsic plagiarism detection is a sub-task of authorship identification in which outlier chunks must be detected solely on the basis of stylistic differences from the main body of the text. We present a first attempt at utilizing words that appear infrequently in a text as stylistic markers for distinguishing outlier chunks in the text. In the first phase of our method we cluster chunks of text represented by usage of infrequent words. In the second phase, we use a training corpus to identify cluster properties of outlier chunks.
Navot Akiva
Added 18 Dec 2011
Updated 18 Dec 2011
Type Journal
Year 2011
Where CLEF
Authors Navot Akiva
Comments (0)