In statistical language modeling, one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. In this paper we invest...
In large vocabulary continuous speech recognition, decision trees are widely used to cluster triphone states. In addition to commonly used phonetically based questions, others hav...
Hank Liao, Christopher Alberti, Michiel Bacchiani,...
We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disa...
Probabilistic latent semantic analysis is enhanced with long distance bigram models in order to improve word clustering. The long distance bigram probabilities and the interpolate...
A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity re...