WebFountain is a platform for very large-scale text analytics applications that allows uniform access to a wide variety of sources. It enables the deployment of a variety of docum...
Markov models have been widely used for modelling users' navigational behaviour in the Web graph, using the transitional probabilities between web pages, as recorded in the w...
The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conve...
As the Internet grows explosively, search engines play a more and more important role for users in effectively accessing online information. Recently, it has been recognized that ...
Ming Ji, Jun Yan, Siyu Gu, Jiawei Han, Xiaofei He,...
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...