Learning a complex metabolomic dataset using random forests and support vector machines

14 years 4 months ago

Download math.uc.edu

Metabolomics is the omics science of biochemistry. The associated data include the quantitative measurements of all small molecule metabolites in a biological sample. These datasets provide a window into dynamic biochemical networks and conjointly with other omic data, genes and proteins, have great potential to unravel complex human diseases. The dataset used in this study has 63 individuals, normal and diseased, and the diseased are drug treated or not, so there are three classes. The goal is to classify these individuals using the observed metabolite levels for 317 measured metabolites. There are a number of statistical challenges: non-normal data, the number of samples is less than the number of metabolites; there are missing data and the fact that data are missing is informative (assay values below detection limits can point to a specific class); also, there are high correlations among the metabolites. We investigate support vector machines (SVM), and random forest (RF), for outl...

Young Truong, Xiaodong Lin, Chris Beecher

Real-time Traffic