This paper describes an italic font recognition method using stroke pattern analysis on wavelet decomposed word images. The word images are extracted from scanned text documents c...
A large fraction of the useful web comprises of specification documents that largely consist of hattribute name, numeric valuei pairs embedded in text. Examples include product in...
A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex langu...
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
The main problems of Optical Character Recognition (OCR) systems are solved if printed latin text is considered. Since OCR systems are based upon binary images, their results are ...