Sciweavers

CORR
2010
Springer
138views Education» more  CORR 2010»
13 years 5 months ago
On building minimal automaton for subset matching queries
We address the problem of building an index for a set D of n strings, where each string location is a subset of some finite integer alphabet of size , so that we can answer effici...
Kimmo Fredriksson
15
Voted
BMCBI
2010
86views more  BMCBI 2010»
13 years 5 months ago
Fast motif recognition via application of statistical thresholds
Background: Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites...
Christina Boucher, James King
STRINGOLOGY
2008
13 years 6 months ago
Conservative String Covering of Indeterminate Strings
Abstract. We study the problem of finding local and global covers as well as seeds in conservative indeterminate strings. An indeterminate string is a sequence T = T[1]T[2] . . . T...
Pavlos Antoniou, Maxime Crochemore, Costas S. Ilio...
NIPS
2008
13 years 6 months ago
Scalable Algorithms for String Kernels with Inexact Matching
We present a new family of linear time algorithms based on sufficient statistics for string comparison with mismatches under the string kernels framework. Our algorithms improve t...
Pavel P. Kuksa, Pai-Hsi Huang, Vladimir Pavlovic
EMNLP
2008
13 years 6 months ago
A Discriminative Candidate Generator for String Transformations
String transformation, which maps a source string s into its desirable form t , is related to various applications including stemming, lemmatization, and spelling correction. The ...
Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiad...
DIS
2008
Springer
13 years 6 months ago
String Kernels Based on Variable-Length-Don't-Care Patterns
Abstract. We propose a new string kernel based on variable-lengthdon't-care patterns (VLDC patterns). A VLDC pattern is an element of ({}) , where is an alphabet and is the ...
Kazuyuki Narisawa, Hideo Bannai, Kohei Hatano, Shu...
GBRPR
2007
Springer
13 years 9 months ago
Generalized vs Set Median Strings for Histogram-Based Distances: Algorithms and Classification Results in the Image Domain
We compare different statistical characterizations of a set of strings, for three different histogram-based distances. Given a distance, a set of strings may be characterized by it...
Christine Solnon, Jean-Michel Jolion
SPIRE
2001
Springer
13 years 9 months ago
On-Line Construction of Symmetric Compact Directed Acyclic Word Graphs
The Compact Directed Acyclic Word Graph (CDAWG) is a space efficient data structure that supports indices of a string. The Symmetric Directed Acyclic Word Graph (SCDAWG) for a st...
Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohar...
SIGMOD
2010
ACM
228views Database» more  SIGMOD 2010»
13 years 9 months ago
Probabilistic string similarity joins
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific compu...
Jeffrey Jestes, Feifei Li, Zhepeng Yan, Ke Yi
ICALP
2003
Springer
13 years 10 months ago
Sophistication Revisited
Kolmogorov complexity measures the ammount of information in a string as the size of the shortest program that computes the string. The Kolmogorov structure function divides the s...
Luis Antunes 0002, Lance Fortnow