Search Sciweavers | Sciweavers

19 search results - page 2 / 4

» Incremental web page template detection

183

click to vote

KDD
2007
ACM

193views Data Mining» more KDD 2007»

Joint optimization of wrapper generation and template detection

16 years 6 months ago

Download www.cse.psu.edu

Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...

Shuyi Zheng, Ruihua Song, Ji-Rong Wen, Di Wu

claim paper

Read More »

138

click to vote

WWW
2009
ACM

147views Internet Technology» more WWW 2009»

A densitometric analysis of web template content

16 years 6 months ago

Download www2009.eprints.org

What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...

Christian Kohlschütter

claim paper

Read More »

153

click to vote

SIGIR
2004
ACM

135views Information Technology» more SIGIR 2004»

15 years 11 months ago

Query-related data extraction of hidden web documents

Download dis.shef.ac.uk

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dyna...

Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...

claim paper

Read More »

166

click to vote

COMPSAC
2002
IEEE

139views Software Engineering» more COMPSAC 2002»

An Approach to Identify Duplicated Web Pages

15 years 11 months ago

Download www.cse.dmu.ac.uk

A relevant consequence of the unceasing expansion of the Web and e-commerce is the growth of the demand of new Web sites and Web applications. The software industry is facing the ...

Giuseppe A. Di Lucca, Massimiliano Di Penta, Anna ...

claim paper

Read More »

153

click to vote

CIKM
2008
Springer

156views Information Technology» more CIKM 2008»

A densitometric approach to web page segmentation

15 years 8 months ago

Download www.l3s.de

Web Page segmentation is a crucial step for many applications in Information Retrieval, such as text classification, de-duplication and full-text search. In this paper we describe...

Christian Kohlschütter, Wolfgang Nejdl

claim paper

Read More »

« Prev « First page 2 / 4 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers