A trainable method for distinguishing between mathematics notation and natural language (here, English) in images of textlines, using computational geometry methods only with no a...
Availability of labeled language resources, such as annotated corpora and domain dependent labeled language resources is crucial for experiments in the field of Natural Language ...
Background: Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implement...
This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
Natural language is the main presentation means in industrial requirements documents. This leads to the fact that requirements documents are often incomplete and inconsistent. Desp...