Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes col...
Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of associa...
Power control is becoming a key challenge for effectively operating a modern data center. In addition to reducing operating costs, precisely controlling power consumption is an es...
With the explosion of social media, scalability becomes a key challenge. There are two main aspects of the problems that arise: 1) data volume: how to manage and analyze huge data...
Ching-Yung Lin, Jimeng Sun, Nan Cao, Shixia Liu, S...
This paper presents a tree-pattern-based method of automatically and accurately finding code clones in program files. Duplicate tree-patterns are first collected by anti-unificati...