The crucial issue in many classification applications is how to achieve the best possible classifier with a limited number of labeled data for training. Training data selection is ...
: The performance of a statistical machine translation (SMT) system heavily depends on the quantity and quality of the bilingual language resource. However, the pervious work mainl...
We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. ...
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated la...
Victor S. Sheng, Foster J. Provost, Panagiotis G. ...
Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality cont...