Abstract. Extracting data from web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. In this paper, we propose a...
Abstract--This paper provides a simple but effective approach, named ECON, to fully-automatically extract content from Web news page. ECON uses a DOM tree to represent the Web news...
Yan Guo, Huifeng Tang, Linhai Song, Yu Wang 0009, ...
In this paper we present an algorithm for automatic extraction of textual elements, namely titles and full text, associated with news stories in news web pages. We propose a super...
We present the RGAI systems which participated in the third Web People Search Task challenge. The chief characteristics of our approach are that we focus on the raw textual parts o...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...