Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hype...
Konstantin Avrachenkov, Vladimir Dobrynin, Danil N...
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Multichannel publication of multimedia presentations poses a significant challenge on the generic description of the presentation content and the system necessary to convert these...
Tom Beckers, Nico Oorts, Filip Hendrickx, Rik Van ...
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of “more like this” queries in the life sciences domain. ...
We propose an algorithm for the binarization of document images degraded by uneven light distribution, based on the Markov Random Field modeling with Maximum A Posteriori probabil...