Sciweavers

945 search results - page 3 / 189
» Information Extraction from HTML: Application of a General M...
Sort
View
JCDL
2005
ACM
100views Education» more  JCDL 2005»
13 years 11 months ago
Automatic extraction of titles from general documents using machine learning
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...
WWW
2011
ACM
13 years 21 days ago
HyLiEn: a hybrid approach to general list extraction on the web
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
ICML
2002
IEEE
14 years 6 months ago
Kernels for Semi-Structured Data
Semi-structured data such as XML and HTML is attracting considerable attention. It is important to develop various kinds of data mining techniques that can handle semistructured d...
Hisashi Kashima, Teruo Koyanagi
DEBU
2000
95views more  DEBU 2000»
13 years 5 months ago
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach
A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web si...
Craig A. Knoblock, Kristina Lerman, Steven Minton,...
WWW
2002
ACM
14 years 6 months ago
A machine learning based approach for table detection on the web
Table is a commonly used presentation scheme, especially for describing relational information. However, table understanding remains an open problem. In this paper, we consider th...
Yalin Wang, Jianying Hu