We present a mathematical model of word sense frequency distributions, and use word distributions to set parameters. The model implies that the expected dominance of the commonest ...
Information filtering has made considerable progress in recent years.The predominant approaches are content-based methods and collaborative methods. Researchers have largely conc...
This paper presents a cluster-based text categorization system which uses class distributional clustering of words. We propose a new clustering model which considers the global in...
We present a parallel version of BIRCH with the objective of enhancing the scalability without compromising on the quality of clustering. The incoming data is distributed in a cyc...
Abstract--Statistical approaches to document content modeling typically focus either on broad topics or on discourselevel subtopics of a text. We present an analysis of the perform...
Leonhard Hennig, Thomas Strecker, Sascha Narr, Ern...