Today’s software projects are often distributed across multiple locations. This distribution poses new challenges produced by the cooperation across different countries, times z...
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...
Decoding noisy document images is commonly needed in applications such as enterprise content management. Available OCR solutions are still not satisfactory especially on noisy ima...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retri...
Multimedia documents are of importance in several application areas, such as education, training, advertising and entertainment. Since multimedia documents may comprise continuous...