Sciweavers

332 search results - page 1 / 67
» Document Content Extraction Using Automatically Discovered F...
Sort
View
ICDAR
2009
IEEE
13 years 2 months ago
Document Content Extraction Using Automatically Discovered Features
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the locat...
Sui-Yu Wang, Henry S. Baird, Chang An
KDD
2002
ACM
148views Data Mining» more  KDD 2002»
14 years 5 months ago
Discovering informative content blocks from Web documents
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Shian-Hua Lin, Jan-Ming Ho
ICDAR
2009
IEEE
13 years 11 months ago
Scalable Feature Extraction from Noisy Documents
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant featu...
Loïc Lecerf, Boris Chidlovskii
MLDM
2005
Springer
13 years 10 months ago
CorePhrase: Keyphrase Extraction for Document Clustering
Abstract. The ability to discover the topic of a large set of text documents using relevant keyphrases is usually regarded as a very tedious task if done by hand. Automatic keyphra...
Khaled M. Hammouda, Diego N. Matute, Mohamed S. Ka...
SIGIR
2003
ACM
13 years 10 months ago
Text categorization by boosting automatically extracted concepts
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...
Lijuan Cai, Thomas Hofmann