Sciweavers

CIDM
2007
IEEE

Measuring the Validity of Document Relations Discovered from Frequent Itemset Mining

13 years 11 months ago
Measuring the Validity of Document Relations Discovered from Frequent Itemset Mining
— The extension approach of frequent itemset mining can be applied to discover the relations among documents. Several schemes, i.e., n-gram, stemming, stopword removal and term weighting, can be applied to form different document representations for mining. It is necessary to formulate a benchmark for comparing the quality of discovered relations extracted from various document representations. This work proposes a series of evaluation criteria, called order accumulative citation matrix, which is formulated from the citation information in the publications. A new measure, called validity, is presented to reflect the validity (or quality) of discovered relations based on the proposed evaluation criteria. Regarding to the dataset, the expected validity is determined as a baseline for each set of discovered relations. With more than 10,000 documents, the experimental results show that the document document relations using bigram as term definition are more valid than those using unigr...
Kritsada Sriphaew, Thanaruk Theeramunkong
Added 02 Jun 2010
Updated 02 Jun 2010
Type Conference
Year 2007
Where CIDM
Authors Kritsada Sriphaew, Thanaruk Theeramunkong
Comments (0)