Sciweavers

25 search results - page 2 / 5
» Encoding standards for large text resources: The Text Encodi...
Sort
View
LREC
2008
89views Education» more  LREC 2008»
13 years 6 months ago
The JOS Morphosyntactically Tagged Corpus of Slovene
The JOS morphosyntactic resources for Slovene consist of the specifications, lexicon, and two corpora: jos100k, a 100,000 word balanced monolingual sampled corpus annotated with h...
Tomaz Erjavec, Simon Krek
LREC
2008
166views Education» more  LREC 2008»
13 years 6 months ago
A lexicon for biology and bioinformatics: the BOOTStrep experience
This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. ...
Valeria Quochi, Monica Monachini, Riccardo Del Gra...
DAGSTUHL
2006
13 years 6 months ago
New tricks from an old dog: An overview of TEI P5
This paper presents an update on the current state of development of the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange. Since the last ma...
Lou Burnard
CORR
2006
Springer
84views Education» more  CORR 2006»
13 years 5 months ago
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages
We present a new, unique and freely available parallel corpus containing European Union (EU) documents of mostly legal nature. It is available in all 20 official EU languages, wit...
Ralf Steinberger, Bruno Pouliquen, Anna Widiger, C...
KDD
1995
ACM
129views Data Mining» more  KDD 1995»
13 years 8 months ago
Feature Extraction for Massive Data Mining
Techniques for learning from data typically require data to be in standard form. Measurements must be encoded in a numerical format such as binary true-or-false features, numerica...
V. Seshadri, Raguram Sasisekharan, Sholom M. Weiss