This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
We present a novel linear clustering framework (DIFFRAC) which relies on a linear discriminative cost function and a convex relaxation of a combinatorial optimization problem. The...
With web image search engines, we face a situation where the results are very noisy, and when we ask for a specific object, we are not ensured that this object is contained in all...
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective i...
500,000 PubMed abstracts. However, less than 50 documents are relevant for most queries. Applying scoring to all 500,000 abstracts would create a lot of noise. In the first step, ...