Improving Generalization by Data Categorization

15 years 6 months ago

Download www.work.caltech.edu

In most of the learning algorithms, examples in the training set are treated equally. Some examples, however, carry more reliable or critical information about the target than the others, and some may carry wrong information. According to their intrinsic margin, examples can be grouped into three categories: typical, critical, and noisy. We propose three methods, namely the selection cost, SVM conﬁdence margin, and AdaBoost data weight, to automatically group training examples into these three categories. Experimental results on artiﬁcial datasets show that, although the three methods have quite diﬀerent nature, they give similar and reasonable categorization. Results with real-world datasets further demonstrate that treating the three data categories diﬀerently in learning can improve generalization.

Ling Li, Amrit Pratap, Hsuan-Tien Lin, Yaser S. Ab

Real-time Traffic

AdaBoost Data Weight | Intrinsic Margin | PKDD 2005 | SVM Conﬁdence Margin |

claim paper

» Improving VGRAM WNN Multilabel Text Categorization via Label Correlation

» Text Categorization for Improved Priors of Word Meaning

» Large scale multilabel classification via metalabeler

» Thirdorder generalization A new approach to categorizing higherorder generalization

» A Note on Covariances for Categorical Data

» Automatic web query classification using labeled and unlabeled training data

» Entity categorization over large document collections

» Web news categorization using a crossmedia document graph

Post Info
More Details (n/a)

Added	28 Jun 2010
Updated	28 Jun 2010
Type	Conference
Year	2005
Where	PKDD
Authors	Ling Li, Amrit Pratap, Hsuan-Tien Lin, Yaser S. Abu-Mostafa

Comments (0)

Sciweavers

Improving Generalization by Data Categorization

AdaBoost Data Weight | Intrinsic Margin | PKDD 2005 | SVM Conﬁdence Margin |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers