Sciweavers

127 search results - page 17 / 26
» Rule-Based Structural Analysis of Web Pages
Sort
View
CIKM
2005
Springer
15 years 3 months ago
Versatile structural disambiguation for semantic-aware applications
In this paper, we propose a versatile disambiguation approach which can be used to make explicit the meaning of structure based information such as XML schemas, XML document struc...
Federica Mandreoli, Riccardo Martoglia, Enrico Ron...
ICDAR
2009
IEEE
15 years 4 months ago
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques
There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
Tamir Hassan
WWW
2009
ACM
15 years 2 months ago
Extracting data records from the web using tag path clustering
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Gengxin Miao, Jun'ichi Tatemura, Wang-Pin Hsiung, ...
TREC
2004
14 years 11 months ago
Language Models for Searching in Web Corpora
: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
Jaap Kamps, Gilad Mishne, Maarten de Rijke
CSUR
1999
159views more  CSUR 1999»
14 years 9 months ago
Hubs, authorities, and communities
The Web can be naturally modeled as a directed graph, consisting of a set of abstract nodes (the pages) joined by directional edges (the hyperlinks). Hyperlinks encode a considerab...
Jon M. Kleinberg