Map-Reduce is a programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. Through ...
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, Dougl...
This paper studies the problem of categorical data clustering, especially for transactional data characterized by high dimensionality and large volume. Starting from a heuristic m...
We present SenseWeb, an open and scalable infrastructure for sharing and geocentric exploration of sensor data streams. SenseWeb allows sensor owners to share data streams across ...
Managing large-scale time series databases has attracted significant attention in the database community recently. Related fundamental problems such as dimensionality reduction, tr...
In this paper, we study search bot traffic from search engine query logs at a large scale. Although bots that generate search traffic aggressively can be easily detected, a large ...