Compound (or mixed) document images contain graphic or textual content along with pictures. They are a very common form of documents, found in magazines, brochures, web-sites etc....
A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity re...
The world wide web has a wealth of information that is related to almost any text classification task. This paper presents a method for mining the web to improve text classificati...
This paper lies on the field of ancient patrimonial books valorization: it precisely relates to the development of suitable assistance tools for humanists and historians to help t...
This paper introduces a multifont classification scheme to help recognition of multifont and multisize characters. It uses typographical attributes such as ascenders, descenders a...