Sciweavers

LREC
2010
159views Education» more  LREC 2010»
13 years 3 months ago
The Web Library of Babel: evaluating genre collections
We present experiments in automatic genre classification on web corpora, comparing a wide variety of features on several different genreannotated datasets (HGC, I-EN, KI-04, KRYS...
Serge Sharoff, Zhili Wu, Katja Markert
COLING
2008
13 years 6 months ago
Source Language Markers in EUROPARL Translations
This paper shows that it is very often possible to identify the source language of medium-length speeches in the EUROPARL corpus on the basis of frequency counts of word n-grams (...
Hans van Halteren