This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
Despite ubiquitous claims that optical character recognition (OCR) is a "solved problem," many categories of documents continue to break modern OCR software such as docu...
This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method differs from previous approaches to paraphrase...
—In order to spot the digits in a handwritten document, each component is sent to a classifier. This is a time consuming process because a document usually contains several hundr...
Nicola Nobile, Chun Lei He, Malik Waqas Sagheer, L...
: Experience Management (EM) is an area that is increasingly gaining importance. Its roots lie in Experimental Software Engineering ("Experience Factory"), in Artificial ...
Andreas Jedlitschka, Klaus-Dieter Althoff, Bjö...