Plagiarism Detection across Distant Language Pairs

7 years 10 months ago
Plagiarism Detection across Distant Language Pairs
Plagiarism, the unacknowledged reuse of text, does not end at language boundaries. Cross-language plagiarism occurs if a text is translated from a fragment written in a different language and no proper citation is provided. Regardless of the change of language, the contents and, in particular, the ideas remain the same. Whereas different methods for the detection of monolingual plagiarism have been developed, less attention has been paid to the crosslanguage case. In this paper we compare two recently proposed cross-language plagiarism detection methods (CL-CNG, based on character n-grams and CL-ASA, based on statistical translation), to a novel approach to this problem, based on machine translation and monolingual similarity analysis (T+MA). We explore the effectiveness of the three approaches for less related languages. CL-CNG shows not be appropriate for this kind of language pairs, whereas T+MA performs better than the previously proposed models.
Alberto Barrón-Cedeño, Paolo Rosso,
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Authors Alberto Barrón-Cedeño, Paolo Rosso, Eneko Agirre, Gorka Labaka
Comments (0)