Sciweavers

563 search results - page 23 / 113
» Crawling the web for structured documents
Sort
View
RIAO
2007
15 years 3 months ago
From Layout to Semantic: a Reranking Model for Mapping Web Documents to Mediated XML Representations
Many documents on the Web are formated in a weakly structured format. Because of their weak semantic and because of the heterogeneity of their formats, the information conveyed by...
Guillaume Wisniewski, Patrick Gallinari
WECWIS
2003
IEEE
132views ECommerce» more  WECWIS 2003»
15 years 7 months ago
Page Digest for Large-Scale Web Services
The rapid growth of the World Wide Web and the Internet has fueled interest in Web services and the Semantic Web, which are quickly becoming important parts of modern electronic c...
Daniel Rocco, David Buttler, Ling Liu
ICDM
2002
IEEE
162views Data Mining» more  ICDM 2002»
15 years 6 months ago
Phrase-based Document Similarity Based on an Index Graph Model
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
Khaled M. Hammouda, Mohamed S. Kamel
WEBDB
2005
Springer
102views Database» more  WEBDB 2005»
15 years 7 months ago
Design and Implementation of a Geographic Search Engine
In this paper, we describe the design and initial implementation of a geographic search engine prototype for Germany, based on a large crawl of the de domain. Geographic search en...
Alexander Markowetz, Yen-Yu Chen, Torsten Suel, Xi...
ICDE
2008
IEEE
218views Database» more  ICDE 2008»
16 years 3 months ago
AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections
The nature of semistructured data in web collections is evolving. Increasingly, XML web documents (or documents exchanged via web services) are valid with regard to a schema, yet ...
Mariano P. Consens, Flavio Rizzolo, Alejandro A. V...