Sciweavers

IUI
2006
ACM
13 years 10 months ago
Recovering semantic relations from web pages based on visual cues
Recovering semantic relations between different parts of web pages are of great importance for multi-platform web interface development, as they make it possible to re-distribute ...
Peifeng Xiang, Yuanchun Shi
HT
2006
ACM
13 years 10 months ago
Implementation and evaluation of a quality-based search engine
In this paper, an approach for the implementation of a qualitybased Web search engine is proposed. Quality retrieval is introduced and an overview on previous efforts to implement...
Thomas Mandl
HT
2006
ACM
13 years 10 months ago
Hyperlink assessment based on web usage mining
One of the basic methods of web usage mining are association rules that indicate relationships among common use of web pages. Positive and confined negative association rules are ...
Przemyslaw Kazienko, Marcin Pilarczyk
HT
2006
ACM
13 years 10 months ago
Just-in-time recovery of missing web pages
We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of “in vivo” preservation: harnessing the col...
Terry L. Harrison, Michael L. Nelson
ADC
2006
Springer
130views Database» more  ADC 2006»
13 years 10 months ago
A two-phase rule generation and optimization approach for wrapper generation
Web information extraction is a fundamental issue for web information management and integrations. A common approach is to use wrappers to extract data from web pages or documents...
Yanan Hao, Yanchun Zhang
WEBI
2007
Springer
13 years 10 months ago
Geographically-Sensitive Link Analysis
Many web pages and resources are primarily relevant to certain geographic locations. For example, in many queries web pages on restaurants, hotels, or movie theaters are only rele...
Hyun Chul Lee, Haifeng Liu, Renée J. Miller
WEBDB
2007
Springer
126views Database» more  WEBDB 2007»
13 years 10 months ago
Towards a Content-Provider-Friendly Web Page Crawler
Search engine quality is impacted by two factors: the quality of the ranking/matching algorithm used and the freshness of the search engine’s index, which maintains a “snapsho...
Jie Xu, Qinglan Li, Huiming Qu, Alexandros Labrini...
WEBDB
2007
Springer
133views Database» more  WEBDB 2007»
13 years 10 months ago
EntityAuthority: Semantically Enriched Graph-Based Authority Propagation
This paper pursues the recently emerging paradigm of searching for entities that are embedded in Web pages. We utilize informationextraction techniques to identify entity candidat...
Julia Stoyanovich, Srikanta J. Bedathur, Klaus Ber...
SOFSEM
2007
Springer
13 years 10 months ago
Creating Permanent Test Collections of Web Pages for Information Extraction Research
In the research area of automatic web information extraction, there is a need for permanent and annotated web page collections enabling objective performance evaluation of differen...
Bernhard Pollak, Wolfgang Gatterbauer
PKDD
2007
Springer
120views Data Mining» more  PKDD 2007»
13 years 10 months ago
Site-Independent Template-Block Detection
Detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction. Of the many approaches propos...
Aleksander Kolcz, Wen-tau Yih