Query optimizers use base-table statistics to derive statistics on the sub-plans that are enumerated during optimization. In practice, traditional optimizers rely on a number of s...
Background: Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotat...
Scientific experiments produce large volumes of data represented as complex objects that describe independent events such as particle collisions. Scientific analyses can be express...
Sampling is a popular method of data collection when it is impossible or too costly to reach the entire population. For example, television show ratings in the United States are g...
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression defined over multiple distributed streams is one of the most fundamental queries of in...