Improving classification accuracy using automatically extracted training data

16 years 6 months ago

Download pages.cs.wisc.edu

Classification is a core task in knowledge discovery and data mining, and there has been substantial research effort in developing sophisticated classification models. In a parallel thread, recent work from the NLP community suggests that for tasks such as natural language disambiguation even a simple algorithm can outperform a sophisticated one, if it is provided with large quantities of high quality training data. In those applications, training data occurs naturally in text corpora, and high quality training data sets running into billions of words have been reportedly used. We explore how we can apply the lessons from the NLP community to KDD tasks. Specifically, we investigate how to identify data sources that can yield training data at low cost and study whether the quantity of the automatically extracted training data can compensate for its lower quality. We carry out this investigation for the specific task of inferring whether a search query has commercial intent. We mine too...

Ariel Fuxman, Anitha Kannan, Andrew B. Goldberg, R

Real-time Traffic

Data Mining | Extracted Training Data | KDD 2009 | Quality Training Data | Training Data |

claim paper

» High Accuracy Handwritten Chinese Character Recognition Using Quadratic Classifiers with D...

» MultiModal Video Concept Extraction Using CoTraining

» Automatic Image Modality Based Classification and Annotation to Improve Medical Image Retr...

» Automatic web query classification using labeled and unlabeled training data

» Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining

» Improving Text Classification by Web Corpora

» Discriminative Cluster Refinement Improving Object Category Recognition Given Limited Trai...

» Unsupervised band removal leading to improved classification accuracy of hyperspectral ima...

Post Info
More Details (n/a)

Added	25 Nov 2009
Updated	25 Nov 2009
Type	Conference
Year	2009
Where	KDD
Authors	Ariel Fuxman, Anitha Kannan, Andrew B. Goldberg, Rakesh Agrawal, Panayiotis Tsaparas, John C. Shafer

Comments (0)

Sciweavers

Improving classification accuracy using automatically extracted training data

Data Mining | Extracted Training Data | KDD 2009 | Quality Training Data | Training Data |

Explore & Download

Productivity Tools

Sciweavers