Sciweavers

180 search results - page 4 / 36
» A Method for Calculating Term Similarity on Large Document C...
Sort
View
CIKM
2001
Springer
13 years 10 months ago
Mining the Web to Create Minority Language Corpora
The Web is a valuable source of language speci c resources but the process of collecting, organizing and utilizing these resources is di cult. We describe CorpusBuilder, an approa...
Rayid Ghani, Rosie Jones, Dunja Mladenic
WIDM
2004
ACM
13 years 11 months ago
Measuring similarity between collection of values
In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat...
Carina F. Dorneles, Carlos A. Heuser, Andrei E. N....
ICML
1998
IEEE
14 years 7 months ago
Learning a Language-Independent Representation for Terms from a Partially Aligned Corpus
Cross-language latent semantic indexing is a method that learns useful languageindependent vector representations of terms through a statistical analysis of a documentaligned text...
Michael L. Littman, Fan Jiang, Greg A. Keim
SIGIR
2009
ACM
14 years 19 days ago
Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of “more like this” queries in the life sciences domain. ...
Jimmy J. Lin
COLING
2010
13 years 1 months ago
Mining Large-scale Comparable Corpora from Chinese-English News Collections
In this paper, we explore a CLIR-based approach to construct large-scale Chinese-English comparable corpora, which is valuable for translation knowledge mining. The initial source...
Degen Huang, Lian Zhao, Lishuang Li, Haitao Yu