When storing data in heterogeneous databases, one of the top-down design issues concerns the usage of multiple query languages. A common language enables querying of database schem...
We describe how simple, commonly understood statistical models, such as statistical dependency parsers, probabilistic context-free grammars, and word-to-word translation models, c...
Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based ...
Lucian Vlad Lita, Abraham Ittycheriah, Salim Rouko...
A general framework for studying the transitivity of reciprocal relations is presented. The key feature is the cyclic evaluation of transitivity: triangles (i.e. any three points)...
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing f...