Sciweavers

IPM
2007
149views more  IPM 2007»
13 years 4 months ago
Web page title extraction and its application
This paper is concerned with automatic extraction of titles from the bodies of HTML documents (web pages). Titles of HTML documents should be correctly defined in the title fields...
Yewei Xue, Yunhua Hu, Guomao Xin, Ruihua Song, Shu...
AAAI
1997
13 years 5 months ago
Template-Based Information Mining from HTML Documents
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...
Jane Yung-jen Hsu, Wen-tau Yih
NAACL
2004
13 years 5 months ago
Acquiring Hyponymy Relations from Web Documents
This paper describes an automatic method for acquiring hyponymy relations from HTML documents on the WWW. Hyponymy relations can play a crucial role in various natural language pr...
Keiji Shinzato, Kentaro Torisawa
FLAIRS
2001
13 years 5 months ago
Extracting Partial Structures from HTML Documents
The new wrapper model for extractiong text data from HTML documents is introduced. The Kushmerick's wrapper class (Kusshmerick 2000) may be unsuccessful in the case that suff...
Hiroshi Sakamoto, Yoshitsugu Murakami, Hiroki Arim...
IADIS
2004
13 years 5 months ago
Using the concept of user policies for improving HTML documents accessibility
In this paper, we introduce the concept of "user policies" and its applications to the browsing of HTML documents. The objective of policies is to specify user preferenc...
Benoît Encelle, Nadine Baptiste-Jessel
ACL
2006
13 years 5 months ago
Automatic Construction of Polarity-Tagged Corpus from HTML Documents
This paper proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to a...
Nobuhiro Kaji, Masaru Kitsuregawa
COOPIS
1998
IEEE
13 years 8 months ago
Wrapper Generation for Web Accessible Data Sources
There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collecti...
Jean-Robert Gruser, Louiqa Raschid, Maria-Esther V...
ICTAI
1999
IEEE
13 years 8 months ago
A New Study on Using HTML Structures to Improve Retrieval
Locating useful information effectively from the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks ...
Michal Cutler, H. Deng, S. Maniccam, Weiyi Meng
IDEAS
2002
IEEE
125views Database» more  IDEAS 2002»
13 years 9 months ago
Integrating HTML Tables Using Semantic Hierarchies And Meta-Data Sets
As the Internet is a global network, there is a demand on accessing closely related data without browsing through di erent Web documents. A signi cant amount of these data are pre...
Seung Jin Lim, Yiu-Kai Ng, Xiaochun Yang
ICDAR
2003
IEEE
13 years 9 months ago
Automatic Discovery of Semantic Structures in HTML Documents
Template-driven HTML documents posses an implicit, fixed schema denoting concepts and their relationships in a hierarchical fashion. Discovering this schema remains a relatively ...
Saikat Mukherjee, Guizhen Yang, Wenfang Tan, I. V....