Sciweavers

78 search results - page 15 / 16
» On The Closest String and Substring Problems
Sort
View
GFKL
2005
Springer
142views Data Mining» more  GFKL 2005»
13 years 11 months ago
Near Similarity Search and Plagiarism Analysis
Abstract. Existing methods to text plagiarism analysis mainly base on “chunking”, a process of grouping a text into meaningful units each of which gets encoded by an integer nu...
Benno Stein, Sven Meyer zu Eissen
LREC
2010
154views Education» more  LREC 2010»
13 years 7 months ago
Information Retrieval of Word Form Variants in Spoken Language Corpora Using Generalized Edit Distance
An important feature of spoken language corpora is existence of different spelling variants of words in transcription. So there is an important problem for linguist who works with...
Siim Orasmaa, Reina Käärik, Jaak Vilo, T...
ALGORITHMICA
1999
112views more  ALGORITHMICA 1999»
13 years 6 months ago
Suffix Trees on Words
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words. This ...
Arne Andersson, N. Jesper Larsson, Kurt Swanson
COLING
1996
13 years 7 months ago
A Statistical Method for Extracting Uninterrupted and Interrupted Collocations from Very Large Corpora
In order to extractrigidexpressions with a high frequency of use, new algorithm that can efficientlyextract both uninterruptedand interruptedcollocationsfrom very large corpora ha...
Satoru Ikehara, Satoshi Shirai, Hajime Uchino
FUN
2010
Springer
312views Algorithms» more  FUN 2010»
13 years 11 months ago
On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching
Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the tex...
Peter Burcsi, Ferdinando Cicalese, Gabriele Fici, ...