"Short-text clustering" is a very important research field due to the current tendency for people to use very short documents, e.g. blogs, text-messaging and others. In s...
Abstract. Telugu is the third most spoken language in India and one of the fifteen most spoken languages in the world. But, there is no standardized input method for Telugu, which ...
In this paper, we develop multilingual supervised latent Dirichlet allocation (MLSLDA), a probabilistic generative model that allows insights gleaned from one language's data...
We describe a method for discovering irregularities in temporal mood patterns appearing in a large corpus of blog posts, and labeling them with a natural language explanation. Sim...
A generalized ICA model allowing overcomplete bases and additive noises in the observables is applied to natural image data. It is well known that such a model produces independen...