The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we de...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
The purpose of a document is to facilitate the transfer of information from its author to its readers. It is the author’s job to design the document so that the information it c...
We describe ongoing research on segmenting and labeling HTML medical journal articles. In contrast to existing approaches in which HTML tags usually serve as strong indicators, we...
The Semantic Web is still a web, a collection of linked nodes. Navigation of links is currently, and will remain for humans if not machines, a key mechanism for exploring the spac...
Carole A. Goble, Sean Bechhofer, Les Carr, David D...