Sciweavers

SAC
2009
ACM
13 years 11 months ago
Combining statistics and semantics via ensemble model for document clustering
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge p...
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
CBMS
2009
IEEE
13 years 11 months ago
Domain concept-based queries for cancer research data sources
Biomedical scientists generate, access, validate and interpret multiple distributed and heterogeneous data sets. Semantic annotations for these data sets are paramount for exchang...
Alejandra González Beltrán, Anthony ...
BIBM
2009
IEEE
192views Bioinformatics» more  BIBM 2009»
13 years 11 months ago
A Multi-task Feature Selection Filter for Microarray Classification
A major challenge in microarray classification and biomarker discovery is dealing with small-sample high-dimensional data where the number of genes used as features is typically o...
Liang Lan, Slobodan Vucetic
EDBT
2010
ACM
116views Database» more  EDBT 2010»
13 years 11 months ago
HARRA: fast iterative hashed record linkage for large-scale data collections
We study the performance issue of the “iterative” record linkage (RL) problem, where match and merge operations may occur together in iterations until convergence emerges. We ...
Hung-sik Kim, Dongwon Lee
ICDE
2010
IEEE
408views Database» more  ICDE 2010»
13 years 11 months ago
Hive - a petabyte scale data warehouse using Hadoop
— The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensiv...
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zhen...
WWW
2010
ACM
13 years 11 months ago
Inferring relevant social networks from interpersonal communication
Researchers increasingly use electronic communication data to construct and study large social networks, effectively inferring unobserved ties (e.g. i is connected to j) from obs...
Munmun De Choudhury, Winter A. Mason, Jake M. Hofm...
CVPR
2010
IEEE
14 years 1 months ago
Unsupervised Learning of Invariant Features Using Video
We present an algorithm that learns invariant features from real data in an entirely unsupervised fashion. The principal benefit of our method is that it can be applied without hu...
David Stavens, Sebastian Thrun
RECOMB
2001
Springer
14 years 5 months ago
Analysis techniques for microarray time-series data
We address possible limitations of publicly available data sets of yeast gene expression. We study the predictability of known regulators via time-series analysis, and show that l...
Vladimir Filkov, Steven Skiena, Jizu Zhi
KDD
2009
ACM
229views Data Mining» more  KDD 2009»
14 years 5 months ago
An association analysis approach to biclustering
The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis perform...
Gaurav Pandey, Gowtham Atluri, Michael Steinbach, ...
ICML
2000
IEEE
14 years 5 months ago
A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets
This paper has no novel learning or statistics: it is concerned with making a wide class of preexisting statistics and learning algorithms computationally tractable when faced wit...
Paul Komarek, Andrew W. Moore