Sciweavers

384 search results - page 5 / 77
» INTEX: A Corpus Processing System
Sort
View
BTW
2007
Springer
122views Database» more  BTW 2007»
15 years 3 months ago
YAWN: A Semantically Annotated Wikipedia XML Corpus
: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce alg...
Ralf Schenkel, Fabian M. Suchanek, Gjergji Kasneci
ICCPOL
2009
Springer
15 years 2 months ago
Constructing Parallel Corpus from Movie Subtitles
Abstract. This paper describes a methodology for constructing aligned German-Chinese corpora from movie subtitles. The corpora will be used to train a special machine translation s...
Han Xiao, Xiaojie Wang
SIGIR
2004
ACM
15 years 3 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber
LRE
2010
136views more  LRE 2010»
14 years 8 months ago
The Corpus DIMEx100: transcription and evaluation
In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational mo...
Luis Alberto Pineda, Hayde Castellanos, Javier Cu&...
ICDE
2012
IEEE
232views Database» more  ICDE 2012»
13 years 4 days ago
A Dataset Search Engine for the Research Document Corpus
— A key step in validating a proposed idea or system is to evaluate over a suitable data set. However, to this date there have been no useful tools for researchers to understand ...
Meiyu Lu, Srinivas Bangalore, Graham Cormode, Mari...