Sciweavers

SIGMOD
2009
ACM

Efficient approximate entity extraction with edit distance constraints

14 years 4 months ago
Efficient approximate entity extraction with edit distance constraints
Named entity recognition aims at extracting named entities from unstructured text. A recent trend of named entity recognition is finding approximate matches in the text with respect to a large dictionary of known entities, as the domain knowledge encoded in the dictionary helps to improve the extraction performance. In this paper, we study the problem of approximate dictionary matching with edit distance constraints. Compared to existing studies using token-based similarity constraints, our problem definition enables us to capture typographical or orthographical errors, both of which are common in entity extraction tasks yet may be missed by token-based similarity constraints. Our problem is technically challenging as existing approaches based on q-gram filtering have poor performance due to the existence of many short entities in the dictionary. Our proposed solution is based on an improved neighborhood generation method employing novel partitioning and prefix pruning techniques. We ...
Wei Wang 0011, Chuan Xiao, Xuemin Lin, Chengqi Zha
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2009
Where SIGMOD
Authors Wei Wang 0011, Chuan Xiao, Xuemin Lin, Chengqi Zhang
Comments (0)