Researchers in the data mining area frequently have to spend significant portion of their time on preprocessing the data in order to apply their algorithms to real-world datasets...
Zhaoqi Chen, Dmitri V. Kalashnikov, Sharad Mehrotr...
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific compu...
As energy-related costs have become a major economical factor for IT infrastructures and data-centers, companies and the research community are being challenged to find better an...
We believe this paper is the first extensive user-study of whitelisting email addresses. While whitelists are common in social networking and instant messaging (e.g., buddylists),...
Reducing management costs and improving the availability of large-scale distributed systems require automatic replica regeneration, i.e., creating new replicas in response to repl...