Sciweavers

45 search results - page 2 / 9
» Building a Web Corpus of Czech
Sort
View
LREC
2010
154views Education» more  LREC 2010»
13 years 6 months ago
Building a Bilingual ValLex Using Treebank Token Alignment: First Observations
In this paper we explore the potential and limitations of a concept of building a bilingual valency lexicon based on the alignment of nodes in a parallel treebank. Our aim is to b...
Jana Sindlerová, Ondrej Bojar
LREC
2010
234views Education» more  LREC 2010»
13 years 6 months ago
Building an Italian FrameNet through Semi-automatic Corpus Analysis
In this paper, we outline the methodology we adopted to develop a FrameNet for Italian. The main element of novelty with respect to the original FrameNet is represented by the fac...
Alessandro Lenci, Martina Johnson, Gabriella Lapes...
LREC
2008
97views Education» more  LREC 2008»
13 years 6 months ago
Towards a Reference Corpus of Web Genres for the Evaluation of Genre Identification Systems
We present initial results from an international and multi-disciplinary research collaboration that aims at the construction of a reference corpus of web genres. The primary appli...
Georg Rehm, Marina Santini, Alexander Mehler, Pave...
INTERSPEECH
2010
12 years 11 months ago
Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit
This paper presents our latest efforts toward LVCSR systems for five Eastern European languages such as Bulgarian, Croatian, Czech, Polish, and Russian using our Rapid Language Ad...
Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanj...
EACL
2006
ACL Anthology
13 years 6 months ago
A Figure of Merit for the Evaluation of Web-Corpus Randomness
In this paper, we present an automated, quantitative, knowledge-poor method to evaluate the randomness of a collection of documents (corpus), with respect to a number of biased pa...
Massimiliano Ciaramita, Marco Baroni