Sciweavers

NAACL
2007
13 years 6 months ago
Virtual Evidence for Training Speech Recognizers Using Partially Labeled Data
Collecting supervised training data for automatic speech recognition (ASR) systems is both time consuming and expensive. In this paper we use the notion of virtual evidence in a g...
Amarnag Subramanya, Jeff A. Bilmes
NAACL
2007
13 years 6 months ago
Can Semantic Roles Generalize Across Genres?
PropBank has been widely used as training data for Semantic Role Labeling. However, because this training data is taken from the WSJ, the resulting machine learning models tend to...
Szu-ting Yi, Edward Loper, Martha Palmer
NAACL
2007
13 years 6 months ago
Using "Annotator Rationales" to Improve Machine Learning for Text Categorization
We propose a new framework for supervised machine learning. Our goal is to learn from smaller amounts of supervised training data, by collecting a richer kind of training data: an...
Omar Zaidan, Jason Eisner, Christine D. Piatko
MSV
2007
13 years 6 months ago
Assessment of ARMAX Structure as a Global Model for Self-Refilling Steam Distillation Essential Oil Extraction System
Abstract - In this paper, an essential oil extraction system with self-refilling system is modeled based on inputoutput data collected from a dedicated acquisition system. The ARMA...
Mohd Hezri Fazalul Rahiman, Mohd Nasir Taib, Yusof...
LREC
2008
114views Education» more  LREC 2008»
13 years 6 months ago
Improving Statistical Machine Translation Efficiency by Triangulation
In current phrase-based Statistical Machine Translation systems, more training data is generally better than less. However, a larger data set eventually introduces a larger model ...
Yu Chen, Andreas Eisele, Martin Kay
LREC
2008
84views Education» more  LREC 2008»
13 years 6 months ago
Statistical Identification of English Loanwords in Korean Using Automatically Generated Training Data
This paper describes an accurate, extensible method for automatically classifying unknown foreign words that requires minimal monolingual resources and no bilingual training data ...
Kirk Baker, Chris Brew
LREC
2008
110views Education» more  LREC 2008»
13 years 6 months ago
Cost-Sensitive Learning in Answer Extraction
One problem of data-driven answer extraction in open-domain factoid question answering is that the class distribution of labeled training data is fairly imbalanced. This imbalance...
Michael Wiegand, Jochen L. Leidner, Dietrich Klako...
ICMLA
2007
13 years 6 months ago
Scalable optimal linear representation for face and object recognition
Optimal Component Analysis (OCA) is a linear method for feature extraction and dimension reduction. It has been widely used in many applications such as face and object recognitio...
Yiming Wu, Xiuwen Liu, Washington Mio
ICMLA
2007
13 years 6 months ago
Memory-based context-sensitive spelling correction at web scale
We study the problem of correcting spelling mistakes in text using memory-based learning techniques and a very large database of token n-gram occurrences in web text as training d...
Andrew Carlson, Ian Fette
EMNLP
2007
13 years 6 months ago
Bootstrapping Information Extraction from Field Books
We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does ex...
Sander Canisius, Caroline Sporleder