Sciweavers

30 search results - page 5 / 6
» Efficient clustering of high-dimensional data sets with appl...
Sort
View
IIS
2003
13 years 7 months ago
Web Search Results Clustering in Polish: Experimental Evaluation of Carrot
Abstract. In this paper we consider the problem of web search results clustering in the Polish language, supporting our analysis with results acquired from an experimental system n...
Dawid Weiss, Jerzy Stefanowski
WWW
2003
ACM
14 years 6 months ago
Text joins in an RDBMS for web data integration
The integration of data produced and collected across autonomous, heterogeneous web services is an increasingly important and challenging problem. Due to the lack of global identi...
Luis Gravano, Panagiotis G. Ipeirotis, Nick Koudas...
COOPIS
2004
IEEE
13 years 9 months ago
A Distributed and Parallel Component Architecture for Stream-Oriented Applications
Abstract. This paper introduces ThreadMill - a distributed and parallel component architecture for applications that process large volumes of streamed (time-sequenced) data, such a...
Paulo Barthelmess, Clarence A. Ellis
ICDM
2005
IEEE
185views Data Mining» more  ICDM 2005»
13 years 11 months ago
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical ap...
Mikhail Bilenko, Sugato Basu, Mehran Sahami
KDD
2008
ACM
176views Data Mining» more  KDD 2008»
14 years 6 months ago
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface
Matching records that refer to the same entity across databases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs ...
Peter Christen