Efficient token based clone detection with flexible tokenization

16 years 8 months ago

Download www.comp.nus.edu.sg

Code clones are similar code fragments that occur at multiple locations in a software system. Detection of code clones provides useful information for maintenance, reengineering, program understanding and reuse. Several techniques have been proposed to detect code clones. These techniques differ in the code representation used for analysis of clones, ranging from plain text to parse trees and program dependence graphs. Clone detection based on lexical tokens involves minimal code transformation and gives good results, but is computationally expensive because of the large number of tokens that need to be compared. We explored string algorithms to find suitable data structures and algorithms for efficient token based clone detection and implemented them in our tool Repeated Tokens Finder (RTF). Instead of using suffix tree for string matching, we use more memory efficient suffix array. RTF incorporates a suffix array based linear time algorithm to detect string matches. It also provides...

Hamid Abdul Basit, Stan Jarzabek

Real-time Traffic

Clone Detection | Keywords Clone Detection | SIGSOFT 2007 | Software Engineering | Token-based Clone Detection |

claim paper

» An Improved Securer and Efficient NonceBased Authentication Scheme with TokenUpdate

» Clone detection and removal for ErlangOTP within a refactoring environment

» From Whence It Came Detecting Source Code Clones by Analyzing Assembler

» Broadcast News Parsing using Visual Cues A Robust Face Detection Approach

Post Info
More Details (n/a)

Added	20 Nov 2009
Updated	20 Nov 2009
Type	Conference
Year	2007
Where	SIGSOFT
Authors	Hamid Abdul Basit, Stan Jarzabek

Comments (0)

Sciweavers

Efficient token based clone detection with flexible tokenization

Clone Detection | Keywords Clone Detection | SIGSOFT 2007 | Software Engineering | Token-based Clone Detection |

Explore & Download

Productivity Tools

Sciweavers