Sciweavers

2 search results - page 1 / 1
» Sparse Indexing: Large Scale, Inline Deduplication Using Sam...
Sort
View
FAST
2009
13 years 2 months ago
Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality
We present sparse indexing, a technique that uses sampling and exploits the inherent locality within backup streams to solve for large-scale backup (e.g., hundreds of terabytes) t...
Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, ...
WWW
2010
ACM
13 years 11 months ago
A pattern tree-based approach to learning URL normalization rules
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...