Sciweavers

DAWAK
2008
Springer

Document-Base Extraction for Single-Label Text Classification

13 years 6 months ago
Document-Base Extraction for Single-Label Text Classification
Many text mining applications, especially when investigating Text Classification (TC), require experiments to be performed using common textcollections, such that results can be compared with alternative approaches. With regard to single-label TC, most text-collections (textual data-sources) in their original form have at least one of the following limitations: the overall volume of textual data is too large for ease of experimentation; there are many predefined classes; most of the classes consist of only a very few documents; some documents are labeled with a single class whereas others have multiple classes; and there are documents found with little or no actual text-content. In this paper, we propose a standard approach to automatically extract "qualified" document-bases from a given textual data-source that can be used more effectively and reliably in single-label TC experiments. The experimental results demonstrate that document-bases extracted based on our approach can...
Yanbo J. Wang, Robert Sanderson, Frans Coenen, Pau
Added 19 Oct 2010
Updated 19 Oct 2010
Type Conference
Year 2008
Where DAWAK
Authors Yanbo J. Wang, Robert Sanderson, Frans Coenen, Paul H. Leng
Comments (0)