Errors in dynamic random access memory (DRAM) are a common form of hardware failure in modern compute clusters. Failures are costly both in terms of hardware replacement costs and...
—We introduce Zen, a new resource allocation framework that assigns application components to node clusters to achieve high availability for partial-fault tolerant (PFT) applicat...
Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for sci...
Isaac G. Councill, Huajing Li, Ziming Zhuang, Sand...
As parallel jobs get bigger in size and finer in granularity, “system noise” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP...
Dan Tsafrir, Yoav Etsion, Dror G. Feitelson, Scott...
Agents are intended to interact in open systems where the knowledge about others (reputation) is incomplete and uncertain. Also, this knowledge about other agents is subjective si...