The vocabulary of the TREC Legal OCR collection is noisy and huge. Standard techniques for improving retrieval performance such as content-based query expansion are ineffective fo...
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract ...
Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We pr...
Xiaonan Lu, James Ze Wang, Prasenjit Mitra, C. Lee...
Medical image retrieval can play an important role for diagnostic and teaching purposes in medicine. Image modality is an important visual characteristic that can be used to impro...
In this paper, we present the results of a project that seeks to transform low-level features to a higher level of meaning. This project concerns a technique, latent semantic anal...