Sciweavers

241 search results - page 1 / 49
» Detecting Co-Derivative Documents in Large Text Collections
Sort
View
LREC
2008
130views Education» more  LREC 2008»
13 years 6 months ago
Detecting Co-Derivative Documents in Large Text Collections
We have analyzed the SPEX algorithm by Bernstein and Zobel (2004) for detecting co-derivative documents using duplicate n-grams. Although we totally agree with the claim that not ...
Jan Pomikálek, Pavel Rychlý
COLING
2010
12 years 11 months ago
Large Scale Parallel Document Mining for Machine Translation
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
CORR
2006
Springer
178views Education» more  CORR 2006»
13 years 4 months ago
A tool set for the quick and efficient exploration of large document collections
: We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the do...
Camelia Ignat, Bruno Pouliquen, Ralf Steinberger, ...
ECIR
2009
Springer
14 years 1 months ago
Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation
Algorithms that enable the process of automatically mining distinct topics in document collections have become increasingly important due to their applications in many fields and ...
Levent Bolelli, Seyda Ertekin, C. Lee Giles
ICDAR
2007
IEEE
13 years 11 months ago
Content-level Annotation of Large Collection of Printed Document Images
A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, ...
Anand Kumar 0002, C. V. Jawahar