This paper presents a Chinese word segmentation system that uses improved sourcechannel models of Chinese sentence generation. Chinese words are defined as one of the following fo...
A major difficulty of supervised approaches for text classification is that they require a great number of training instances in order to construct an accurate classifier. This pap...
Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large sets of documents into a small number of meaningful ...
In this paper we address the problem of building a good speech recognizer if there is only a small amount of training data available. The acoustic models can be improved by interpo...
Stefan Steidl, Georg Stemmer, Christian Hacker, El...
We address the problem of sketching the hamming distance of data streams. We present a new notion of sketching technique, Fixable sketches and we show that using such sketch not o...