Web page classification is important to many tasks in information retrieval and web mining. However, applying traditional textual classifiers on web data often produces unsatisfyi...
Abstract We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has ...
One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructe...
Multi-headed reverse linking (incoming links) is a fundamental concept of Open Hypermedia Systems. However, this bi-directionality has been lost in the move to the World Wide Web ...
A lot of the world’s knowledge is stored in books, which, as a result of recent mass-digitisation efforts, are increasingly available online. Search engines, such as Google Book...