124views more  JMLR 2010»
10 years 8 months ago
Consistent Nonparametric Tests of Independence
Three simple and explicit procedures for testing the independence of two multi-dimensional random variables are described. Two of the associated test statistics (L1, log-likelihoo...
Arthur Gretton, László Györfi
10 years 8 months ago
A Longitudinal View of the Relationship Between Social Marginalization and Obesity
We use 3 Waves of the Add Health data collected between 1994 and 2002 to conduct a longitudinal study of the relationship between social marginalization and the weight status of ad...
Andrea Apolloni, Achla Marathe, Zhengzheng Pan
10 years 11 months ago
Knowing the Unseen: Estimating Vocabulary Size over Unseen Samples
Empirical studies on corpora involve making measurements of several quantities for the purpose of comparing corpora, creating language models or to make generalizations about spec...
Suma Bhat, Richard Sproat
111views more  JMLR 2002»
11 years 1 months ago
The Learning-Curve Sampling Method Applied to Model-Based Clustering
We examine the learning-curve sampling method, an approach for applying machinelearning algorithms to large data sets. The approach is based on the observation that the computatio...
Christopher Meek, Bo Thiesson, David Heckerman
117views more  TSP 2008»
11 years 1 months ago
Sample Eigenvalue Based Detection of High-Dimensional Signals in White Noise Using Relatively Few Samples
The detection and estimation of signals in noisy, limited data is a problem of interest to many scientific and engineering communities. We present a mathematically justifiable, com...
R. R. Nadakuditi, A. Edelman
94views more  BMCBI 2005»
11 years 1 months ago
Reproducible Clusters from Microarray Research: Whither?
Motivation: In cluster analysis, the validity of specific solutions, algorithms, and procedures present significant challenges because there is no null hypothesis to test and no &...
Nikhil R. Garge, Grier P. Page, Alan P. Sprague, B...
187views more  PAMI 2006»
11 years 1 months ago
An Experimental Study on Pedestrian Classification
Detecting people in images is key for several important application domains in computer vision. This paper presents an in-depth experimental study on pedestrian classification; mul...
Stefan Munder, Dariu M. Gavrila
88views more  ENVSOFT 2007»
11 years 1 months ago
Resampling-based software for estimating optimal sample size
The SISSI program implements a novel approach for the estimation of the optimal sample size in experimental data collection. It provides avisual evaluation system of sample size d...
Roberto Confalonieri, Marco Acutis, Gianni Bellocc...
215views more  BMCBI 2007»
11 years 1 months ago
Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregres
Background: Causal networks based on the vector autoregressive (VAR) process are a promising statistical tool for modeling regulatory interactions in a cell. However, learning the...
Rainer Opgen-Rhein, Korbinian Strimmer
90views more  BMCBI 2006»
11 years 1 months ago
The PowerAtlas: a power and sample size atlas for microarray experimental design and research
Background: Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experimen...
Grier P. Page, Jode W. Edwards, Gary L. Gadbury, P...