Sciweavers

4 search results - page 1 / 1
» Certification and Cleaning up of a Text Corpus: Towards an E...
Sort
View
LREC
2008
77views Education» more  LREC 2008»
13 years 6 months ago
Certification and Cleaning up of a Text Corpus: Towards an Evaluation of the "Grammatical" Quality of a Corpus
We present in this article the methods we used for obtaining measures to ensure the quality and well-formedness of a text corpus. These measures allow us to determine the compatib...
Cyril Grouin
CICLING
2008
Springer
13 years 7 months ago
Non-interactive OCR Post-correction for Giga-Scale Digitization Projects
This paper proposes a non-interactive system for reducing the level of OCR-induced typographical variation in large text collections, contemporary and historical. Text-Induced Corp...
Martin Reynaert
LREC
2010
237views Education» more  LREC 2010»
13 years 6 months ago
Entity Mention Detection using a Combination of Redundancy-Driven Classifiers
We present an experimental framework for Entity Mention Detection in which two different classifiers are combined to exploit Data Redundancy attained through the annotation of a l...
Silvana Marianela Bernaola Biggio, Manuela Speranz...
ICML
1997
IEEE
13 years 9 months ago
A Comparative Study on Feature Selection in Text Categorization
This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods we...
Yiming Yang, Jan O. Pedersen