Patent document images maintained by the U.S. patent database have a specific format, in which figures and text descriptions are separated into different sections. This makes it d...
In this paper we present a system to locate, extract and recognize Telugu text. The circular nature of Telugu script is exploited for segmenting text regions using the Hough Trans...
Atul Negi, K. Nikhil Shanker, Chandra Kanth Chered...
There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and info...
Mark Dredze, Aren Jansen, Glen Coppersmith, Ken Wa...
Natural language is the main presentation means in industrial requirements documents. This leads to the fact that requirements documents are often incomplete and inconsistent. Desp...
In this paper, a high-speed document image classification algorithm is presented. The algorithm is based on the bottom-up strategy which can successfully segment and classify any ...