The Web is a valuable source of language speci c resources but the process of collecting, organizing and utilizing these resources is di cult. We describe CorpusBuilder, an approa...
In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat...
Carina F. Dorneles, Carlos A. Heuser, Andrei E. N....
Cross-language latent semantic indexing is a method that learns useful languageindependent vector representations of terms through a statistical analysis of a documentaligned text...
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of “more like this” queries in the life sciences domain. ...
In this paper, we explore a CLIR-based approach to construct large-scale Chinese-English comparable corpora, which is valuable for translation knowledge mining. The initial source...