A major obstacle that decreases the performance of text classifiers is the extremely high dimensionality of text data. To reduce the dimension, a number of approaches based on rou...
There is a reservoir of knowledge in data from the TREC evaluations that analysis of precision and recall leaves untapped. This knowledge leads to better understanding of query ex...
Large collections of documents are commonly created around a database, where a typical database schema may contain hundreds of tables and thousands of columns. We developed a syst...
Carlos Garcia-Alvarado, Carlos Ordonez, Zhibo Chen...
The quality of document content, which is an issue that is usually ignored for the traditional ad hoc retrieval task, is a critical issue for Web search. Web pages have a huge var...
—A method for locating mathematical expressions in document images without the use of optical character recognition is presented. An index of document regions is produced from re...