The complicated user interfaces and complex functionality of nowadays interactive products lead to a new class of failures: People do not understand their products and thus fail t...
An N-gram language model aims at capturing statistical word order dependency information from corpora. Although the concept of language models has been applied extensively to handl...
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) t...
A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it ...
Collaborative work on unstructured or semistructured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templ...