In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology cons...
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the locat...
Charts are common graphic representation for scientific data in technical and business papers. We present a robust system for detecting and recognizing bar charts. The system incl...
Leximancer is a software system for performing conceptual analysis of text data in a largely language independent manner. The system is modelled on Content Analysis and provides u...
In this paper, we present the main features of a text mining based search engine for the UK Educational Evidence Portal available at the UK National Centre for Text Mining (NaCTeM...
Sophia Ananiadou, John McNaught, James Thomas, Mar...