Sciweavers

563 search results - page 15 / 113
» Crawling the web for structured documents
Sort
View
TKDE
2002
111views more  TKDE 2002»
15 years 1 months ago
Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents
Since WWW encourages hypertext and hypermedia document authoring (e.g. HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperl...
Wen-Syan Li, K. Selçuk Candan, Quoc Vu, Div...
CIT
2005
Springer
15 years 1 months ago
Simple Classification into Large Topic Ontology of Web Documents
The paper presents an approach to classifying Web documents into large topic ontology. The main emphasis is on having a simple approach appropriate for handling a large ontology an...
Marko Grobelnik, Dunja Mladenic
109
Voted
ECIR
2006
Springer
15 years 3 months ago
Automatic Document Organization in a P2P Environment
Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We co...
Stefan Siersdorfer, Sergej Sizov
LAWEB
2003
IEEE
15 years 7 months ago
On the Evolution of Clusters of Near-Duplicate Web Pages
This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
Dennis Fetterly, Mark Manasse, Marc Najork
WIDM
2006
ACM
15 years 7 months ago
Coarse-grained classification of web sites by their structural properties
In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the co...
Christoph Lindemann, Lars Littig