We address the problem of building an index for a set D of n strings, where each string location is a subset of some finite integer alphabet of size , so that we can answer effici...
Background: Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites...
Abstract. We study the problem of finding local and global covers as well as seeds in conservative indeterminate strings. An indeterminate string is a sequence T = T[1]T[2] . . . T...
Pavlos Antoniou, Maxime Crochemore, Costas S. Ilio...
We present a new family of linear time algorithms based on sufficient statistics for string comparison with mismatches under the string kernels framework. Our algorithms improve t...
String transformation, which maps a source string s into its desirable form t , is related to various applications including stemming, lemmatization, and spelling correction. The ...
Abstract. We propose a new string kernel based on variable-lengthdon't-care patterns (VLDC patterns). A VLDC pattern is an element of ({}) , where is an alphabet and is the ...
We compare different statistical characterizations of a set of strings, for three different histogram-based distances. Given a distance, a set of strings may be characterized by it...
The Compact Directed Acyclic Word Graph (CDAWG) is a space efficient data structure that supports indices of a string. The Symmetric Directed Acyclic Word Graph (SCDAWG) for a st...
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific compu...
Kolmogorov complexity measures the ammount of information in a string as the size of the shortest program that computes the string. The Kolmogorov structure function divides the s...