Sciweavers

1277 search results - page 199 / 256
» The Google Similarity Distance
Sort
View
WWW
2009
ACM
15 years 10 months ago
The slashdot zoo: mining a social network with negative edges
We analyse the corpus of user relationships of the Slashdot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other use...
Andreas Lommatzsch, Christian Bauckhage, Jé...
WWW
2007
ACM
15 years 10 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2006
ACM
15 years 10 months ago
GoGetIt!: a tool for generating structure-driven web crawlers
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
RECOMB
2009
Springer
15 years 10 months ago
Finding Biologically Accurate Clusterings in Hierarchical Tree Decompositions Using the Variation of Information
Abstract. Hierarchical clustering is a popular method for grouping together similar elements based on a distance measure between them. In many cases, annotation information for som...
Saket Navlakha, James Robert White, Niranjan Nagar...
KDD
2001
ACM
187views Data Mining» more  KDD 2001»
15 years 10 months ago
Random projection in dimensionality reduction: applications to image and text data
Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however,...
Ella Bingham, Heikki Mannila