We describe a simple improvement to ngram language models where we estimate the distribution over closed-class (function) words separately from the conditional distribution of ope...
In morphologically rich languages such as Arabic, the abundance of word forms resulting from increased morpheme combinations is significantly greater than for languages with fewer...
—We propose a method to improve traditional character-based PPM text compression algorithms. Consider a text file as a sequence of alternating words and non-words, the basic ide...
Co-occurrence analysis has been used to determine related words or terms in many NLP-related applications such as query expansion in Information Retrieval (IR). However, related w...
Bag-of-visual-words (BoW) has been popular for visual classification in recent years. In this paper, we propose a novel BoW expansion method to alleviate the effect of visual word...