This paper presents a discriminative alignment model for extracting abbreviations and their full forms appearing in actual text. The task of abbreviation recognition is formalized...
The amount of data exponentially increases in information systems and it becomes more and more difficult to extract the most relevant information within a very short time. Among ot...
Mixtures of probabilistic principal component analyzers model high-dimensional nonlinear data by combining local linear models. Each mixture component is specifically designed to...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
Chinese is a language that does not have morphological tense markers that provide explicit grammaticalization of the temporal location of situations (events or states). However, i...