Similarity Patterns in Language

11 years 5 months ago
Similarity Patterns in Language
Dotplot is a technique for visualizing patterns of string matches in millions of lines of text and code. Patterns may be explored interactively or detected automatically. Applications include text analysis (author identification, plagiarism detection, translation alignment, etc.), software engineering (module and version identification, subroutine categorization, redundant code identification, etc.), and information retrieval (identification of similar records in results of queries). Patterns are interpreted though a visual language. Squares identify unordered matches (documents with lots of matching words or subroutines with lots of matching symbols), while diagonals identify ordered matches (copies, versions, and translations). Patterns of squares and diagonals have more complex interpretations that identify subtler relationships.
Jonathan Helfman
Added 09 Aug 2010
Updated 09 Aug 2010
Type Conference
Year 1994
Where VL
Authors Jonathan Helfman
Comments (0)