Sciweavers

ER
2004
Springer

Automatic Location and Separation of Records: A Case Study in the Genealogical Domain

13 years 10 months ago
Automatic Location and Separation of Records: A Case Study in the Genealogical Domain
Abstract. Locating specific chunks (records) of information within documents on the web is an interesting and nontrivial problem. If the problem of locating and separating records can be solved well, the longstanding problem of grouping extracted values into appropriate relationships in a record structure can be more easily resolved. Our solution is a hybrid of two well established techniques: (1) ontology-based extraction [ECJ+ 99] and (2) vector space modeling [SM83]. To show that the technique has merit, we apply it to the particularly challenging task of locating and separating records for genealogical web documents, which tend to vary considerably in layout and format. Experiments we have conducted show this technique yields an average of 92% recall and 93% precision for locating and separating genealogical records in web documents.
Troy Walker, David W. Embley
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where ER
Authors Troy Walker, David W. Embley
Comments (0)