The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is sem...
We present a method for picture detection in document page images, which can come from scanned or camera images, or rendered from electronic file formats. Our method uses OCR to s...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
: This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thum...
It has been shown that using phrases properly in the document retrieval leads to higher retrieval effectiveness. In this paper, we define four types of noun phrases and present an...
Wei Zhang, Shuang Liu, Clement T. Yu, Chaojing Sun...