Sciweavers

9 search results - page 1 / 2
» A Query-Dependent Duplicate Detection Approach for Large Sca...
Sort
View
APWEB
2004
Springer
13 years 8 months ago
A Query-Dependent Duplicate Detection Approach for Large Scale Search Engines
Duplication of Web pages greatly hurts the perceived relevance of a search engine. Existing methods for detecting duplicated Web pages can be classified into two categories, i.e. o...
Shaozhi Ye, Ruihua Song, Ji-Rong Wen, Wei-Ying Ma
OOPSLA
2005
Springer
13 years 10 months ago
SDD: high performance code clone detection system for large scale source code
Code clones in software increase maintenance cost and lower software quality. We have devised a new algorithm to detect duplicated parts of source code in large software. Our algo...
Seunghak Lee, Iryoung Jeong
SIGIR
2010
ACM
12 years 11 months ago
Efficient partial-duplicate detection based on sequence matching
With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since parti...
Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang
WSDM
2010
ACM
315views Data Mining» more  WSDM 2010»
14 years 2 months ago
SBotMiner: Large Scale Search Bot Detection
In this paper, we study search bot traffic from search engine query logs at a large scale. Although bots that generate search traffic aggressively can be easily detected, a large ...
Fang Yu, Yinglian Xie, Qifa Ke
WWW
2010
ACM
14 years 18 hour ago
A pattern tree-based approach to learning URL normalization rules
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...