This paper addresses the problem of categorizing terms or lexical entities into a predefined set of semantic domains exploiting the knowledge available on-line in the Web. The prop...
Leonardo Rigutini, Ernesto Di Iorio, Marco Ernande...
Representing documents by vectors that are independent of language enhances machine translation and multilingual text categorization. We use discriminative training to create a pr...
Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an...
We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian m...