Sciweavers

708 search results - page 1 / 142
» Identifying Content Blocks from Web Documents
Sort
View
ISMIS
2005
Springer
13 years 10 months ago
Identifying Content Blocks from Web Documents
Intelligent information processing systems, such as digital libraries or search engines index web-pages according to their informative content. However, web-pages contain several n...
Sandip Debnath, Prasenjit Mitra, C. Lee Giles
WWW
2011
ACM
12 years 11 months ago
Identifying primary content from web pages and its application to web search ranking
Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
Srinivas Vadrevu, Emre Velipasaoglu
KDD
2002
ACM
148views Data Mining» more  KDD 2002»
14 years 5 months ago
Discovering informative content blocks from Web documents
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Shian-Hua Lin, Jan-Ming Ho
BIS
2006
106views Business» more  BIS 2006»
13 years 6 months ago
Expected Utility of Content Blocks in Web Content Extraction
In this paper we discuss the possible application of new concepts in web content extraction: utility assessment, utility annealing, and dynamic aggregated document generation. Aft...
Marek Kowalkiewicz
HT
2005
ACM
13 years 10 months ago
As we may perceive: inferring logical documents from hypertext
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Pavel Dmitriev, Carl Lagoze, Boris Suchkov