Sciweavers

KDD
1998
ACM

Joins that Generalize: Text Classification Using WHIRL

13 years 8 months ago
Joins that Generalize: Text Classification Using WHIRL
WHIRL is an extensionof relational databasesthat canperform "soft joins" basedon the similarity of textual identifiers;thesesoftjoins extendthe traditional operationof joining tablesbasedon the equivalenceof atomic values. This paper evaluatesWHIRL on a number of inductive classificationtasksusing datafrom the World Wide Web.We show thatalthoughWHIRL is designedfor moregeneralsimilaritybasedreasoningtasks,it is competitivewith matureinductive classificationsystemson theseclassificationtasks. In particular, WHIRL generally achieveslower generalizationerror than C4.5, RIPPER,and severalnearest-neighbormethods. WHIRL is also fast-p to 500 times fasterthan C4.5 on somebenchmarkproblems. We also show that WHIRL can be efficiently usedto selectfrom a large pool of unlabeled items thosethat can be classifiedcorrectly with high confidence.
William W. Cohen, Haym Hirsh
Added 06 Aug 2010
Updated 06 Aug 2010
Type Conference
Year 1998
Where KDD
Authors William W. Cohen, Haym Hirsh
Comments (0)