Ten groups participated in the TREC-2001 cross-language information retrieval track, which focussed on retrieving Arabic language documents based on 25 queries that were originall...
This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this pro...
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach is to apply machine learning algorithms to ...
Wei Fan, Haixun Wang, Philip S. Yu, Salvatore J. S...
Abstract--S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continu...
Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anan...
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...