Sciweavers

19 search results - page 2 / 4
» Incremental web page template detection
Sort
View
KDD
2007
ACM
193views Data Mining» more  KDD 2007»
14 years 6 months ago
Joint optimization of wrapper generation and template detection
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu
WWW
2009
ACM
14 years 7 months ago
A densitometric analysis of web template content
What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...
Christian Kohlschütter
SIGIR
2004
ACM
13 years 11 months ago
Query-related data extraction of hidden web documents
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dyna...
Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...
COMPSAC
2002
IEEE
13 years 11 months ago
An Approach to Identify Duplicated Web Pages
A relevant consequence of the unceasing expansion of the Web and e-commerce is the growth of the demand of new Web sites and Web applications. The software industry is facing the ...
Giuseppe A. Di Lucca, Massimiliano Di Penta, Anna ...
CIKM
2008
Springer
13 years 8 months ago
A densitometric approach to web page segmentation
Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...
Christian Kohlschütter, Wolfgang Nejdl