S4: Distributed Stream Computing Platform

8 years 1 months ago
S4: Distributed Stream Computing Platform
Abstract--S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs, (2) publish results. The architecture resembles the Actors model [1], providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. In this paper, we outline the S4 architecture in detail, describe various applications, including real-life deployments. Our design is primarily driven by large scale applications for data mining and machine learning in a production environment. We show that the S4 design is surprisingly flexible and lends itself to run in larg...
Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anan
Added 03 Mar 2011
Updated 03 Mar 2011
Type Journal
Year 2010
Where ICDM
Authors Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari
Comments (0)