We propose a visualization method based on a topic model for discrete data such as documents. Unlike conventional visualization methods based on pairwise distances such as multi-d...
Many web documents are dynamic, with content changing in varying amounts at varying frequencies. However, current document search algorithms have a static view of the document con...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
The availability of large, heterogeneous repositories of electronic documents is increasing rapidly, and the need for flexible, sophisticated document manipulation tools is growi...
Floriana Esposito, Stefano Ferilli, Teresa Maria A...
Multistructured documents are documents whose structure is composed of a set of concurrent hierarchical structures. In this paper, we propose a new model of multistructured docume...