Given a set D = {d1, d2, ..., dD} of D strings of total length n, our task is to report the "most relevant" strings for a given query pattern P. This involves somewhat mo...
In this paper we introduce the MatchDetectReveal(MDR) system, which is capable of identifying overlapping and plagiarised documents. Each component of the system is briefly descri...
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous ...
Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upt...
Phrase has been considered as a more informative feature term for improving the effectiveness of document clustering. In this paper, we propose a phrase-based document similarity t...
Models of activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational...
Raffay Hamid, Siddhartha Maddi, Aaron F. Bobick, I...