Sciweavers

IQ
2003

ClueMaker: A Language for Approximate Record Matching

13 years 5 months ago
ClueMaker: A Language for Approximate Record Matching
We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a machine-learning technique to compute a match probability. ClueMaker is based on Java and is compiled to Java source or byte code. Therefore, ClueMaker is easily accessible to many programmers, allows the integration of any Java class, runs on virtually any platform, supports UNICODE, and is more easily accepted by IT departments who try to minimize the number of distinct languages in use. ChoiceMaker Technologies has used ClueMaker successfully over the past two years in a variety of approximate record matching tasks. Key Words: Approximate record matching, deduplication, programming language, Java, machine...
Martin Buechi, Andrew Borthwick, Adam Winkel, Arth
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where IQ
Authors Martin Buechi, Andrew Borthwick, Adam Winkel, Arthur Goldberg
Comments (0)