This paper aims at presenting how natural language processing and machine learning techniques can help the internet surfer to get a better overview of the pages he is reading. The ...
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
The inherent lack of control over the Internet content resulted in proliferation of online material that can be potentially detrimental. For example, the infamous “Anarchist Coo...
I report briefly on some of my own work in each of these areas and elucidate some of the questions that this research has raised. Then I propose as a research agenda the developme...
Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and auto...