Sciweavers

SIGIR
2010
ACM
12 years 10 months ago
Efficient partial-duplicate detection based on sequence matching
With the ever-increasing growth of the Internet, numerous copies of documents become serious problem for search engine, opinion mining and many other web applications. Since parti...
Qi Zhang, Yue Zhang, Haomin Yu, Xuanjing Huang
DGO
2006
134views Education» more  DGO 2006»
13 years 5 months ago
Next steps in near-duplicate detection for eRulemaking
Large volume public comment campaigns and web portals that encourage the public to customize form letters produce many near-duplicate documents, which increases processing and sto...
Hui Yang, Jamie Callan, Stuart W. Shulman
MM
2009
ACM
249views Multimedia» more  MM 2009»
13 years 8 months ago
MyFinder: near-duplicate detection for large image collections
The explosive growth of multimedia data poses serious challenges to data storage, management and search. Efficient near-duplicate detection is one of the required technologies for...
Xin Yang, Qiang Zhu, Kwang-Ting Cheng
SIGIR
2006
ACM
13 years 9 months ago
Near-duplicate detection by instance-level constrained clustering
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Hui Yang, James P. Callan