In spite of the initialization problem, the ExpectationMaximization (EM) algorithm is widely used for estimating the parameters in several data mining related tasks. Most popular ...
Chandan K. Reddy, Hsiao-Dong Chiang, Bala Rajaratn...
We describe a model for strings of characters that is loosely based on the Lempel Ziv model with the addition that a repeated substring can be an approximate match to the original...
In this paper, we study an online data mining problem from streams of semi-structured data such as XML data. Modeling semi-structured data and patterns as labeled ordered trees, w...
Recently there has been an increasing interest in developing regression models for large datasets that are both accurate and easy to interpret. Regressors that have these properti...
Statistical language modeling (SLM) has been used in many different domains for decades and has also been applied to information retrieval (IR) recently. Documents retrieved using...