An N-gram language model aims at capturing statistical word order dependency information from corpora. Although the concept of language models has been applied extensively to handl...
Abstract. The Multiple Bernoulli (MB) Language Model has been generally considered too computationally expensive for practical purposes and superseded by the more efficient multino...
Cross-language latent semantic indexing is a method that learns useful languageindependent vector representations of terms through a statistical analysis of a documentaligned text...
Most databases contain “name constants” like course numbers, personal names, and place names that correspond to entities in the real world. Previous work in integration of het...
This paper proposes an OCR post-processing approach based on multi-knowledge, which integrates language knowledge and candidate distance information given by the OCR engine. In thi...