Sciweavers

841 search results - page 141 / 169
» Information Retrieval with Cluster Genetic
Sort
View
85
Voted
WWW
2008
ACM
15 years 10 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
78
Voted
WWW
2007
ACM
15 years 10 months ago
Generative models for name disambiguation
Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with ot...
Yang Song, Jian Huang 0002, Isaac G. Councill, Jia...
62
Voted
WWW
2006
ACM
15 years 10 months ago
GoGetIt!: a tool for generating structure-driven web crawlers
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
WWW
2006
ACM
15 years 10 months ago
The impact of online music services on the demand for stars in the music industry
The music industry's business model is to produce stars. In order to do so, musicians producing music that fits into well defined clusters of factors explaining the demand of...
Ian Pascal Volz
98
Voted
KDD
2005
ACM
118views Data Mining» more  KDD 2005»
15 years 10 months ago
On the use of linear programming for unsupervised text classification
We propose a new algorithm for dimensionality reduction and unsupervised text classification. We use mixture models as underlying process of generating corpus and utilize a novel,...
Mark Sandler