Sciweavers

SDM
2008
SIAM
133views Data Mining» more  SDM 2008»
13 years 5 months ago
Semantic Smoothing for Bayesian Text Classification with Small Training Data
Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...
Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu
SDM
2008
SIAM
119views Data Mining» more  SDM 2008»
13 years 5 months ago
An Efficient Local Algorithm for Distributed Multivariate Regression in Peer-to-Peer Networks
This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm is designed for distributed inferencing, data compact...
Kanishka Bhaduri, Hillol Kargupta
SDM
2008
SIAM
138views Data Mining» more  SDM 2008»
13 years 5 months ago
Learning Markov Network Structure using Few Independence Tests
In this paper we present the Dynamic Grow-Shrink Inference-based Markov network learning algorithm (abbreviated DGSIMN), which improves on GSIMN, the state-ofthe-art algorithm for...
Parichey Gandhi, Facundo Bromberg, Dimitris Margar...
SDM
2008
SIAM
144views Data Mining» more  SDM 2008»
13 years 5 months ago
Semi-supervised Multi-label Learning by Solving a Sylvester Equation
Multi-label learning refers to the problems where an instance can be assigned to more than one category. In this paper, we present a novel Semi-supervised algorithm for Multi-labe...
Gang Chen, Yangqiu Song, Fei Wang, Changshui Zhang
SDM
2008
SIAM
165views Data Mining» more  SDM 2008»
13 years 5 months ago
On the Dangers of Cross-Validation. An Experimental Evaluation
Cross validation allows models to be tested using the full training set by means of repeated resampling; thus, maximizing the total number of points used for testing and potential...
R. Bharat Rao, Glenn Fung
SDM
2008
SIAM
144views Data Mining» more  SDM 2008»
13 years 5 months ago
Active Learning with Model Selection in Linear Regression
Optimally designing the location of training input points (active learning) and choosing the best model (model selection) are two important components of supervised learning and h...
Masashi Sugiyama, Neil Rubens
SDM
2008
SIAM
139views Data Mining» more  SDM 2008»
13 years 5 months ago
Proximity Tracking on Time-Evolving Bipartite Graphs
Given an author-conference network that evolves over time, which are the conferences that a given author is most closely related with, and how do they change over time? Large time...
Hanghang Tong, Spiros Papadimitriou, Philip S. Yu,...
SDM
2008
SIAM
118views Data Mining» more  SDM 2008»
13 years 5 months ago
Massive-Scale Kernel Discriminant Analysis: Mining for Quasars
We describe a fast algorithm for kernel discriminant analysis, empirically demonstrating asymptotic speed-up over the previous best approach. We achieve this with a new pattern of...
Ryan Riegel, Alexander Gray, Gordon Richards
SDM
2008
SIAM
177views Data Mining» more  SDM 2008»
13 years 5 months ago
Cluster Ensemble Selection
This paper studies the ensemble selection problem for unsupervised learning. Given a large library of different clustering solutions, our goal is to select a subset of solutions t...
Xiaoli Z. Fern, Wei Lin
SDM
2008
SIAM
177views Data Mining» more  SDM 2008»
13 years 5 months ago
Roughly Balanced Bagging for Imbalanced Data
Imbalanced class problems appear in many real applications of classification learning. We propose a novel sampling method to improve bagging for data sets with skewed class distri...
Shohei Hido, Hisashi Kashima