Sciweavers

NAACL
2003
13 years 6 months ago
Comma Restoration Using Constituency Information
Automatic restoration of punctuation from unpunctuated text has application in improving the fluency and applicability of speech recognition systems. We explore the possibility t...
Stuart M. Shieber, Xiaopeng Tao
NIPS
2004
13 years 6 months ago
Confidence Intervals for the Area Under the ROC Curve
In many applications, good ranking is a highly desirable performance for a classifier. The criterion commonly used to measure the ranking quality of a classification algorithm is ...
Corinna Cortes, Mehryar Mohri
IIS
2004
13 years 6 months ago
Trigram morphosyntactic tagger for Polish
Abstract. We introduce an implementation of a plain trigram part-of-speech tagger which appears to work well on Polish texts. At this moment the tagger achieves 9.4% error rate, wh...
Lukasz Debowski
ACL
2003
13 years 6 months ago
Classifying Recognition Results for Spoken Dialog Systems
This paper investigates the correlation between acoustic confidence scores as returned by speech recognizers with recognition quality. We report the results of two machine learni...
Malte Gabsdil
ACL
2006
13 years 6 months ago
Maximum Entropy Based Restoration of Arabic Diacritics
Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of ...
Imed Zitouni, Jeffrey S. Sorensen, Ruhi Sarikaya
TRECVID
2008
13 years 6 months ago
UEC at TRECVID 2008 High Level Feature Task
In this paper, we describe our approach and results for high-level feature extraction task (HLF) at TRECVID2008. This year, our focus is to develop a framework which fuses a numbe...
Zhiyuan Tang, Keiji Yanai
EMNLP
2007
13 years 6 months ago
Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features
This paper discusses automatic determination of case in Arabic. This task is an important part and major source of errors in full diacritization of Arabic. We use a goldstandard s...
Nizar Habash, Ryan Gabbard, Owen Rambow, Seth Kuli...
LREC
2010
158views Education» more  LREC 2010»
13 years 6 months ago
Ways of Evaluation of the Annotators in Building the Prague Czech-English Dependency Treebank
In this paper, we present several ways to measure and evaluate the annotation and annotators, proposed and used during the building of the Czech part of the Prague Czech-English D...
Marie Mikulová, Jan Stepánek
AUSDM
2006
Springer
112views Data Mining» more  AUSDM 2006»
13 years 8 months ago
Accuracy Estimation With Clustered Dataset
If the dataset available to machine learning results from cluster sampling (e.g. patients from a sample of hospital wards), the usual cross-validation error rate estimate can lead...
Ricco Rakotomalala, Jean-Hugues Chauchat, Fran&cce...
ICDM
2007
IEEE
131views Data Mining» more  ICDM 2007»
13 years 8 months ago
Predicting and Optimizing Classifier Utility with the Power Law
When data collection is costly and/or takes a significant amount of time, an early prediction of the classifier performance is extremely important for the design of the data minin...
Mark Last