Sciweavers

735 search results - page 122 / 147
» Corpora and data preparation
Sort
View
ACHI
2008
IEEE
14 years 12 months ago
Specification for User Modeling with Self-Observing Systems
The complicated user interfaces and complex functionality of nowadays interactive products lead to a new class of failures: People do not understand their products and thus fail t...
Mathias Funk, Piet van der Putten, Henk Corporaal
CICLING
2008
Springer
14 years 12 months ago
A Semantics-Enhanced Language Model for Unsupervised Word Sense Disambiguation
An N-gram language model aims at capturing statistical word order dependency information from corpora. Although the concept of language models has been applied extensively to handl...
Shou-de Lin, Karin Verspoor
ACL
2008
14 years 11 months ago
Mining Wiki Resources for Multilingual Named Entity Recognition
In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate a large corpus of text with Named Entity Recognition (NER) t...
Alexander E. Richman, Patrick Schone
ACL
2007
14 years 11 months ago
Randomised Language Modelling for Statistical Machine Translation
A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it ...
David Talbot, Miles Osborne
68
Voted
CASCON
2007
112views Education» more  CASCON 2007»
14 years 11 months ago
Removing manually generated boilerplate from electronic texts: experiments with project Gutenberg e-books
Collaborative work on unstructured or semistructured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templ...
Owen Kaser, Daniel Lemire