The proliferation of text documents on the web as well as within institutions necessitates their convenient organization to enable efficient retrieval of information. Although tex...
Sriharsha Veeramachaneni, Diego Sona, Paolo Avesan...
Web Clustering is useful for several activities in the WWW, from automatically building web directories to improve retrieval performance. Nevertheless, due to the huge size of the...
In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. O...
In this paper we propose an extension of the PLSA model in which an extra latent variable allows the model to cocluster documents and terms simultaneously. We show on three datase...
In this article we present a method for retrieving documents from a digital library through a visual interface based on automatically generated concepts. We used a vocabulary gene...