Abstract. Discovering significant meta-information from document collections is a critical factor for knowledge distribution and preservation. This paper presents a system that im...
Floriana Esposito, Stefano Ferilli, Teresa Maria A...
Documents in a wide range of genres often contain references to their own sections, pictures etc. We call such referring expressions instances of Document Deixis. The present work ...
Annotating the regions, text lines and characters of document images is an important, but tedious and expensive task. A ground-truthing tool may largely alleviate the human burden...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
This paper presents an intelligent Internet information system, Automatic Classifier for the Internet Resource Discovery (ACIRD), which uses machine learning techniques to organiz...