Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated c...
Erik Linstead, Paul Rigor, Sushil Krishna Bajracha...
The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual...
Given the continuous growth of databases and the abundance of diverse files in modern IT environments, there is a pressing need to integrate keyword search on heterogeneous inform...
Latent semantic analysis (LSA), as one of the most popular unsupervised dimension reduction tools, has a wide range of applications in text mining and information retrieval. The k...
Xi Chen, Yanjun Qi, Bing Bai, Qihang Lin, Jaime G....
Background: Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the dev...
Bryan Chi, Ronald J. deLeeuw, Bradley P. Coe, Calu...