(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical patt...
This paper describes the general structure of a full automated document analysis system for printed documents. The system is based on a character preclassification stage which red...
Abstract. Terms which are not explicitly mentioned in the text of a document receive often a minor role in current retrieval systems. In this work we connect the management of such...
Web Clustering is useful for several activities in the WWW, from automatically building web directories to improve retrieval performance. Nevertheless, due to the huge size of the...
Hidden Markov Models (HMM) are probabilistic graphical models for interdependent classification. In this paper we experiment with different ways of combining the components of an ...