“MegaProto” is a proof-of-concept prototype for our project “Mega-Scale Computing Based on Low-Power Technology and Workload Modeling”, implementing our key idea that a mi...
Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for sci...
Isaac G. Councill, Huajing Li, Ziming Zhuang, Sand...
Large-scale cluster-based Internet services often host partitioned datasets to provide incremental scalability. The aggregation of results produced from multiple partitions is a f...
While scalable data mining methods are expected to cope with massive Web data, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stopp...
Social media sites (e.g., Flickr, YouTube, and Facebook) are a popular distribution outlet for users looking to share their experiences and interests on the Web. These sites host ...