Sciweavers

41 search results - page 7 / 9
» Large Scale Parallel Document Mining for Machine Translation
Sort
View
TREC
2001
14 years 11 months ago
The TREC-2001 Cross-Language Information Retrieval Track: Searching Arabic Using English, French or Arabic Queries
Ten groups participated in the TREC-2001 cross-language information retrieval track, which focussed on retrieving Arabic language documents based on 25 queries that were originall...
Fredric C. Gey, Douglas W. Oard
SIGIR
2011
ACM
14 years 10 days ago
No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity
This work explores the problem of cross-lingual pairwise similarity, where the task is to extract similar pairs of documents across two different languages. Solutions to this pro...
Ferhan Ture, Tamer Elsayed, Jimmy J. Lin
90
Voted
ICDCS
2002
IEEE
15 years 2 months ago
A Fully Distributed Framework for Cost-Sensitive Data Mining
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach is to apply machine learning algorithms to ...
Wei Fan, Haixun Wang, Philip S. Yu, Salvatore J. S...
ICDM
2010
IEEE
189views Data Mining» more  ICDM 2010»
14 years 6 months ago
S4: Distributed Stream Computing Platform
Abstract--S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continu...
Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anan...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
15 years 4 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...