Sciweavers

CIKM
2001
Springer

Improved String Matching Under Noisy Channel Conditions

13 years 9 months ago
Improved String Matching Under Noisy Channel Conditions
Many document-based applications, including popular Web browsers, email viewers, and word processors, have a ‘Find on this Page’ feature that allows a user to find every occurrence of a given string in the document. If the document text being searched is derived from a noisy process such as optical character recognition (OCR), the effectiveness of typical string matching can be greatly reduced. This paper describes an enhanced string-matching algorithm for degraded text that improves recall, while keeping precision at acceptable levels. The algorithm is more general than most approximate matching algorithms and allows string-to-string edits with arbitrary costs. We develop a method for evaluating our technique and use it to examine the relative effectiveness of each sub-component of the algorithm. Of the components we varied, we find that using confidence information from the recognition process lead to the largest improvements in matching accuracy. Keywords Approximate String Mat...
Kevyn Collins-Thompson, Charles Schweizer, Susan T
Added 28 Jul 2010
Updated 28 Jul 2010
Type Conference
Year 2001
Where CIKM
Authors Kevyn Collins-Thompson, Charles Schweizer, Susan T. Dumais
Comments (0)