Sciweavers

260 search results - page 26 / 52
» Industry-scale duplicate detection
Sort
View
WWW
2004
ACM
15 years 10 months ago
Web data integration using approximate string join
Web data integration is an important preprocessing step for web mining. It is highly likely that several records on the web whose textual representations differ may represent the ...
Yingping Huang, Gregory R. Madey
ICC
2008
IEEE
163views Communications» more  ICC 2008»
15 years 4 months ago
A New Replay Attack Against Anonymous Communication Networks
Abstract— Tor is a real-world, circuit-based low-latency anonymous communication network, supporting TCP applications on the Internet. In this paper, we present a new class of at...
Ryan Pries, Wei Yu, Xinwen Fu, Wei Zhao
ITC
2003
IEEE
141views Hardware» more  ITC 2003»
15 years 2 months ago
Cost-Effective Approach for Reducing Soft Error Failure Rate in Logic Circuits
In this paper, a new paradigm for designing logic circuits with concurrent error detection (CED) is described. The key idea is to exploit the asymmetric soft error susceptibility ...
Kartik Mohanram, Nur A. Touba
FAST
2010
15 years 5 hour ago
HydraFS: A High-Throughput File System for the HYDRAstor Content-Addressable Storage System
A content-addressable storage (CAS) system is a valuable tool for building storage solutions, providing efficiency by automatically detecting and eliminating duplicate blocks; it ...
Cristian Ungureanu, Benjamin Atkin, Akshat Aranya,...
90
Voted
IJCAI
2003
14 years 11 months ago
Employing Trainable String Similarity Metrics for Information Integration
The problem of identifying approximately duplicate objects in databases is an essential step for the information integration process. Most existing approaches have relied on gener...
Mikhail Bilenko, Raymond J. Mooney