Sciweavers

EMNLP
2010
13 years 2 months ago
Negative Training Data Can be Harmful to Text Classification
This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Tradit...
Xiaoli Li, Bing Liu, See-Kiong Ng
EMNLP
2010
13 years 2 months ago
Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing
Inducing a grammar directly from text is one of the oldest and most challenging tasks in Computational Linguistics. Significant progress has been made for inducing dependency gram...
Phil Blunsom, Trevor Cohn
EMNLP
2010
13 years 2 months ago
Simple Type-Level Unsupervised POS Tagging
Part-of-speech (POS) tag distributions are known to exhibit sparsity -- a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorp...
Yoong Keok Lee, Aria Haghighi, Regina Barzilay
EMNLP
2010
13 years 2 months ago
Improving Gender Classification of Blog Authors
The problem of automatically classifying the gender of a blog author has important applications in many commercial domains. Existing systems mainly use features such as words, wor...
Arjun Mukherjee, Bing Liu
EMNLP
2010
13 years 2 months ago
Towards Conversation Entailment: An Empirical Investigation
While a significant amount of research has been devoted to textual entailment, automated entailment from conversational scripts has received less attention. To address this limita...
Chen Zhang, Joyce Yue Chai
EMNLP
2010
13 years 2 months ago
A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension
Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corp...
Hugo Hernault, Danushka Bollegala, Mitsuru Ishizuk...
EMNLP
2010
13 years 2 months ago
A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model
We show that the standard beam-search algorithm can be used as an efficient decoder for the global linear model of Zhang and Clark (2008) for joint word segmentation and POS-taggi...
Yue Zhang 0004, Stephen Clark
EMNLP
2010
13 years 2 months ago
Word-Based Dialect Identification with Georeferenced Rules
We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard Germ...
Yves Scherrer, Owen Rambow
EMNLP
2010
13 years 2 months ago
Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
This paper addresses the problem of learning to map sentences to logical form, given training data consisting of natural language sentences paired with logical representations of ...
Tom Kwiatkowksi, Luke S. Zettlemoyer, Sharon Goldw...
EMNLP
2010
13 years 2 months ago
WikiWars: A New Corpus for Research on Temporal Expressions
The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data ...
Pawel P. Mazur, Robert Dale