Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned websearch ranking — a domain notorious for very large data sets. ...
Stephen Tyree, Kilian Q. Weinberger, Kunal Agrawal...
Recently many large scale computer systems are built in order to meet the high storage and processing demands of compute and data-intensive applications. MapReduce is one of the mo...
—In this paper, we present an effective and scalable system for multivariate volume data visualization and analysis with a novel transfer function interface design that tightly c...
Periscope is a distributed automatic online performance analysis system for large scale parallel systems. It consists of a set of analysis agents distributed on the parallel machin...
—Normalization before clustering is often needed for proximity indices, such as Euclidian distance, which are sensitive to differences in the magnitude or scales of the attribute...