Ad-hoc data processing in the cloud

9 years 8 months ago
Ad-hoc data processing in the cloud
Ad-hoc data processing has proven to be a critical paradigm for Internet companies processing large volumes of unstructured data. However, the emergence of cloud-based computing, where storage and CPU are outsourced to multiple third-parties across the globe, implies large collections of highly distributed and continuously evolving data. Our demonstration combines the power and simplicity of the e abstraction with a wide-scale distributed stream processor, Mortar. While our incremental MapReduce operators avoid data re-processing, the stream processor manages the placement and physical data flow of the operators across the wide area. We demonstrate a distributed web indexing engine against which users can submit and deploy continuous MapReduce jobs. A visualization component illustrates both the incremental indexing and index searches in real time.
Dionysios Logothetis, Ken Yocum
Added 28 Jan 2011
Updated 28 Jan 2011
Type Journal
Year 2008
Authors Dionysios Logothetis, Ken Yocum
Comments (0)