Sciweavers

LREC
2010
159views Education» more  LREC 2010»
13 years 3 months ago
The Web Library of Babel: evaluating genre collections
We present experiments in automatic genre classification on web corpora, comparing a wide variety of features on several different genreannotated datasets (HGC, I-EN, KI-04, KRYS...
Serge Sharoff, Zhili Wu, Katja Markert
ICWSM
2008
13 years 6 months ago
A Shallow Approach to Subjectivity Classification
We present a shallow linguistic approach to subjectivity classification. Using multinomial kernel machines, we demonstrate that a data representation based on counting character n...
Stephan Raaijmakers, Wessel Kraaij
CLEF
2008
Springer
13 years 6 months ago
JHU Ad Hoc Experiments at CLEF 2008
For CLEF 2008 JHU conducted monolingual and bilingual experiments in the ad hoc TEL and Persian tasks. The TEL task involved focused on searching electronic card catalog records i...
Paul McNamee
CLEF
2006
Springer
13 years 8 months ago
A First Approach to CLIR Using Character N -Grams Alignment
Abstract. This paper describes the technique for translation of character n-grams we developed for our participation in CLEF 2006. This solution avoids the need for word normalizat...
Jesús Vilares, Michael P. Oakes, John Tait
AIMSA
2006
Springer
13 years 8 months ago
N-Gram Feature Selection for Authorship Identification
Automatic authorship identification offers a valuable tool for supporting crime investigation and security. It can be seen as a multi-class, single-label text categorization task. ...
John Houvardas, Efstathios Stamatatos
NLDB
2007
Springer
13 years 10 months ago
Character N-Grams Translation in Cross-Language Information Retrieval
Abstract. This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems. This solution avoids the nee...
Jesús Vilares, Michael P. Oakes, Manuel Vil...
SIGIR
2009
ACM
13 years 11 months ago
Addressing morphological variation in alphabetic languages
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normaliz...
Paul McNamee, Charles K. Nicholas, James Mayfield