Sciweavers

850 search results - page 81 / 170
» Representing Text Chunks
Sort
View
CICLING
2008
Springer
15 years 5 months ago
Non-interactive OCR Post-correction for Giga-Scale Digitization Projects
This paper proposes a non-interactive system for reducing the level of OCR-induced typographical variation in large text collections, contemporary and historical. Text-Induced Corp...
Martin Reynaert
121
Voted
LREC
2010
178views Education» more  LREC 2010»
15 years 4 months ago
Design and Data Collection for the Accentological Corpus of the Russian Language
Accentological corpus provides a researcher an opportunity to study word stress and stress variation, which are very important for the Russian language. Moreover, Accentological c...
Elena Grishina, Svetlana Savchuk, Alexej Poljakov
LREC
2010
184views Education» more  LREC 2010»
15 years 4 months ago
New Features in Spoken Language Search Hawk (SpLaSH): Query Language and Query Sequence
In this work we present further development of the SpLaSH (Spoken Language Search Hawk) project. SpLaSH implements a data model for annotated speech corpora integrated with textua...
Sara Romano, Francesco Cutugno
127
Voted
DLOG
1997
15 years 4 months ago
Action Hierarchies in Description Logics
Our project aims at the automatic generation of multilingual text for product maintenance and documentation from a structured knowledge representation formalized by means of plans...
Thorsten Liebig, Dietmar Rösner
162
Voted
ACL
1994
15 years 4 months ago
A Stochastic Finite-State Word-Segmentation Algorithm for Chinese
We present a stochastic finite-state model for segmenting Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the me...
Richard Sproat, Chilin Shih, William Gale, Nancy C...