The proliferation of digital libraries and the large amount of existing documents raise important issues in efficient handling of documents. Printed texts in documents need to be...
We introduce the relative rank differential statistic which is a non-parametric approach to document and dialog analysis based on word frequency rank-statistics. We also present a...
Many document-based applications, including popular Web browsers, email viewers, and word processors, have a ‘Find on this Page’ feature that allows a user to find every occur...
Kevyn Collins-Thompson, Charles Schweizer, Susan T...
The needs for managing similar documents in different languages increases with the growing amounts of electronic information available in documents of the same type (e.g. news str...
Roberto Basili, Maria Teresa Pazienza, Fabio Massi...
The main problems of Optical Character Recognition (OCR) systems are solved if printed latin text is considered. Since OCR systems are based upon binary images, their results are ...