Traditional n-gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfi...
In many applied problems in the context of pattern recognition, the data often involve highly asymmetric observations. Normal mixture models tend to overfit when additional compone...
A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit t...
Many probabilistic models are only defined up to a normalization constant. This makes maximum likelihood estimation of the model parameters very difficult. Typically, one then h...
Markov networks are a common class of graphical models used in machine learning. Such models use an undirected graph to capture dependency information among random variables in a ...