Sciweavers

735 search results - page 86 / 147
» Corpora and data preparation
Sort
View
LREC
2008
108views Education» more  LREC 2008»
14 years 11 months ago
A Lightweight and Efficient Tool for Cleaning Web Pages
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Stefan Evert
LREC
2008
85views Education» more  LREC 2008»
14 years 11 months ago
Amazigh Language Terminology in Morocco or Management of a "Multidimensional" Variation
The present communication brings to the fore the work undertaken at IRCAM within CAL within the framework of the language planning of Amazigh, particularly on the side of terminol...
Aïcha Bouhjar
LREC
2008
132views Education» more  LREC 2008»
14 years 11 months ago
A Bilingual Corpus of Inter-linked Events
This paper describes the creation of a bilingual corpus of inter-linked events for Italian and English. Linkage is accomplished through the Inter-Lingual Index (ILI) that links It...
Tommaso Caselli, Nancy Ide, Roberto Bartolini
LREC
2008
66views Education» more  LREC 2008»
14 years 11 months ago
Sentence Alignment in DPC: Maximizing Precision, Minimizing Human Effort
Awide spectrum of multilingual applications have aligned parallel corpora as their prerequisite. The aim of the project described in this paper is to build a multilingual corpus w...
Julia S. Trushkina, Lieve Macken, Hans Paulussen
LREC
2008
82views Education» more  LREC 2008»
14 years 11 months ago
An eRulemaking Corpus: Identifying Substantive Issues in Public Comments
We describe the creation of a corpus that supports a real-world hierarchical text categorization task in the domain of electronic rulemaking (eRulemaking). Features of the task an...
Claire Cardie, Cynthia Farina, Matt Rawding, Adil ...