Please use this identifier to cite or link to this item:
Title: On the use of character n-grams as the only intrinsic evidence of plagiarism
Authors: Bensalem, Imene 
Rosso, Paolo 
Chikhi, Salim 
Affiliations: Faculty of Information and Communication Technology (ICT) 
Faculty of Information and Communication Technology (ICT) 
Keywords: Intrinsic plagiarism detection;Character n-grams;Stylistic features;Writing style analysis
Date: 1-Sep-2019
Publisher: Springer Netherlands
Journal: Language Resources and Evaluation 
Volume: 53
Start page: 363
End page: 396
When a shift in writing style is noticed in a document, doubts arise about its originality. Based on this clue to plagiarism, the intrinsic approach to plagiarism detection identifies the stolen passages by analysing the writing style of the suspicious document without comparing it to textual resources that may serve as sources for the plagiarist. Character n-grams are recognised as a successful approach to modelling text for writing style analysis. Although prior studies have investigated the best practice of using character n-grams in authorship attribution and other problems, there is still a need for such investigations in the context of intrinsic plagiarism detection. Moreover, it has been assumed in previous works that the ways of using character n-grams in authorship attribution remain the same for intrinsic plagiarism detection. In this paper, we study the effect of character n-grams frequency and length on the performance of intrinsic plagiarism detection. Our experiments utilise two state-of-the-art methods and five large document collections of PAN labs written in English and Arabic. We demonstrate empirically that the low- and the high-frequency n-grams are not equally relevant for intrinsic plagiarism detection, but their performance depends on the way they are exploited.
DOI: 10.1007/s10579-019-09444-w
Appears in Collections:Journal Articles

Files in This Item:
File Description SizeFormat
Author_version.pdf1.52 MBAdobe PDFView/Open

Show full item record

Page view(s)

checked on Jan 29, 2021


checked on Jan 29, 2021

Google ScholarTM




Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.