Sciweavers

103
Voted
NAACL
2004
15 years 9 days ago
A Language Modeling Approach to Predicting Reading Difficulty
We demonstrate a new research approach to the problem of predicting the reading difficulty of a text passage, by recasting readability in terms of statistical language modeling. W...
Kevyn Collins-Thompson, James P. Callan
84
Voted
NAACL
2004
15 years 9 days ago
Name Tagging with Word Clusters and Discriminative Training
We present a technique for augmenting annotated training data with hierarchical word clusters that are automatically derived from a large unannotated corpus. Cluster membership is...
Scott Miller, Jethran Guinness, Alex Zamanian
94
Voted
NAACL
2004
15 years 9 days ago
Multiple Similarity Measures and Source-Pair Information in Story Link Detection
State-of-the-art story link detection systems, that is, systems that determine whether two stories are about the same event or linked, are usually based on the cosine-similarity m...
Francine Chen, Ayman Farahat, Thorsten Brants
101
Voted
NAACL
2004
15 years 9 days ago
Robust Reading: Identification and Tracing of Ambiguous Names
A given entity, representing a person, a location or an organization, may be mentioned in text in multiple, ambiguous ways. Understanding natural language requires identifying whe...
Xin Li, Paul Morie, Dan Roth
88
Voted
NAACL
2004
15 years 9 days ago
Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution
We present a coreference resolver called BABAR that uses contextual role knowledge to evaluate possible antecedents for an anaphor. BABAR uses information extraction patterns to i...
David L. Bean, Ellen Riloff
100
Voted
NAACL
2004
15 years 9 days ago
Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. W...
Regina Barzilay, Lillian Lee
NAACL
2004
15 years 9 days ago
Inferring Sentence-internal Temporal Relations
In this paper we propose a data intensive approach for inferring sentence-internal temporal relations, which relies on a simple probabilistic model and assumes no manual coding. W...
Mirella Lapata, Alex Lascarides
90
Voted
NAACL
2004
15 years 9 days ago
The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks
Previous work demonstrated that web counts can be used to approximate bigram frequencies, and thus should be useful for a wide variety of NLP tasks. So far, only two generation ta...
Mirella Lapata, Frank Keller
84
Voted
NAACL
2004
15 years 9 days ago
A Probabilistic Rasch Analysis of Question Answering Evaluations
The field of Psychometrics routinely grapples with the question of what it means to measure the inherent ability of an organism to perform a given task, and for the last forty yea...
Rense Lange, Juan Moran, Warren R. Greiff, Lisa Fe...
NAACL
2004
15 years 9 days ago
Minimum Bayes-Risk Decoding for Statistical Machine Translation
We present Minimum Bayes-Risk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functio...
Shankar Kumar, William J. Byrne