Sciweavers

1319 search results - page 3 / 264
» Using the Structure of HTML Documents to Improve Retrieval
Sort
View
IPM
2007
149views more  IPM 2007»
13 years 5 months ago
Web page title extraction and its application
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
Yewei Xue, Yunhua Hu, Guomao Xin, Ruihua Song, Shu...
ACMICEC
2006
ACM
141views ECommerce» more  ACMICEC 2006»
13 years 11 months ago
From HTML documents to web tables and rules
We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
Kai Simon, Georg Lausen, Harold Boley
AAAI
2012
11 years 7 months ago
Improving Twitter Retrieval by Exploiting Structural Information
Most Twitter search systems generally treat a tweet as a plain text when modeling relevance. However, a series of conventions allows users to tweet in structural ways using combin...
Zhunchen Luo, Miles Osborne, Sasa Petrovic, Ting W...
WEBDB
1999
Springer
196views Database» more  WEBDB 1999»
13 years 9 months ago
Web Ecology: Recycling HTML Pages as XML Documents Using W4F
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
Arnaud Sahuguet, Fabien Azavant
ACL
2006
13 years 6 months ago
Automatic Construction of Polarity-Tagged Corpus from HTML Documents
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to a...
Nobuhiro Kaji, Masaru Kitsuregawa