Locality-Sensitive Hashing (LSH) and its variants are wellknown methods for solving the c-approximate NN Search problem in high-dimensional space. Traditionally, several LSH funct...
Big data is the tar sands of the data world: vast reserves of raw gritty data whose valuable information content can only be extracted at great cost. MapReduce is a popular parall...
We present BloomUnit, a testing framework for distributed programs written in the Bloom language. BloomUnit allows developers to write declarative test specifications that descri...
Peter Alvaro, Andrew Hutchinson, Neil Conway, Will...
In this demo, we will present Tiresias, the first how-to query engine. How-to queries represent fundamental data analysis questions of the form: “How should the input change in...
The advent of affordable, shared-nothing computing systems portends a new class of parallel database management systems (DBMS) for on-line transaction processing (OLTP) applicatio...