Sciweavers

311 search results - page 16 / 63
» Cleaning Web Pages for Effective Web Content Mining
Sort
View
WWW
2008
ACM
15 years 10 months ago
Genealogical trees on the web: a search engine user perspective
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using al...
Ricardo A. Baeza-Yates, Álvaro R. Pereira J...
KDD
2002
ACM
148views Data Mining» more  KDD 2002»
15 years 9 months ago
Discovering informative content blocks from Web documents
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Shian-Hua Lin, Jan-Ming Ho
WWW
2004
ACM
15 years 10 months ago
What's new on the web?: the evolution of the web from a search engine perspective
We seek to gain improved insight into how Web search engines should cope with the evolving Web, in an attempt to provide users with the most up-to-date results possible. For this ...
Alexandros Ntoulas, Junghoo Cho, Christopher Olsto...
WWW
2011
ACM
14 years 4 months ago
HyLiEn: a hybrid approach to general list extraction on the web
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
KDD
2007
ACM
182views Data Mining» more  KDD 2007»
15 years 9 months ago
Cleaning disguised missing data: a heuristic approach
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
Ming Hua, Jian Pei