Abstract. This paper explores the use of initial Stochastic Context-Free Grammars (SCFG) obtained from a treebank corpus for the learning of SCFG by means of estimation algorithms....
While many visualization tools exist that offer sophisticated functions for charting complex data, they still expect users to possess a high degree of expertise in wielding the to...
Yiwen Sun, Jason Leigh, Andrew E. Johnson, Sangyoo...
Frequency counts from very large corpora, such as the Web 1T dataset, have recently become available for language modeling. Omission of low frequency n-gram counts is a practical ...
: We describe our participation in the TREC 2004 Web and Terabyte tracks. For the web track, we employ mixture language models based on document full-text, incoming anchortext, and...
We present a novel discriminative training algorithm for n-gram language models for use in large vocabulary continuous speech recognition. The algorithm uses large margin estimati...