Sciweavers

SDM
2007
SIAM
184views Data Mining» more  SDM 2007»
13 years 5 months ago
Mining Naturally Smooth Evolution of Clusters from Dynamic Data
Many clustering algorithms have been proposed to partition a set of static data points into groups. In this paper, we consider an evolutionary clustering problem where the input d...
Yi Wang, Shi-Xia Liu, Jianhua Feng, Lizhu Zhou
SDM
2007
SIAM
182views Data Mining» more  SDM 2007»
13 years 5 months ago
Distance Preserving Dimension Reduction for Manifold Learning
Manifold learning is an effective methodology for extracting nonlinear structures from high-dimensional data with many applications in image analysis, computer vision, text data a...
Hyunsoo Kim, Haesun Park, Hongyuan Zha
IJCAI
2007
13 years 5 months ago
Locality Sensitive Discriminant Analysis
Linear Discriminant Analysis (LDA) is a popular data-analytic tool for studying the class relationship between data points. A major disadvantage of LDA is that it fails to discove...
Deng Cai, Xiaofei He, Kun Zhou, Jiawei Han, Hujun ...
SDM
2010
SIAM
195views Data Mining» more  SDM 2010»
13 years 5 months ago
Adaptive Informative Sampling for Active Learning
Many approaches to active learning involve periodically training one classifier and choosing data points with the lowest confidence. An alternative approach is to periodically cho...
Zhenyu Lu, Xindong Wu, Josh Bongard
DMIN
2007
226views Data Mining» more  DMIN 2007»
13 years 5 months ago
Generative Oversampling for Mining Imbalanced Datasets
— One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, d...
Alexander Liu, Joydeep Ghosh, Cheryl Martin
CCCG
2008
13 years 5 months ago
Competitive Search for Longest Empty Intervals
A problem arising in statistical data analysis and pattern recognition is to find a longest interval free of data points, given a set of data points in the unit interval. We use t...
Peter Damaschke
AAAI
2010
13 years 5 months ago
G-Optimal Design with Laplacian Regularization
In many real world applications, labeled data are usually expensive to get, while there may be a large amount of unlabeled data. To reduce the labeling cost, active learning attem...
Chun Chen, Zhengguang Chen, Jiajun Bu, Can Wang, L...
IUI
2010
ACM
13 years 5 months ago
Finding your way in a multi-dimensional semantic space with luminoso
We present Luminoso, a tool that helps researchers to visualize and understand a dimensionality-reduced semantic space by exploring it interactively. It also streamlines the proce...
Robert Speer, Catherine Havasi, K. Nichole Treadwa...
AAAI
2008
13 years 6 months ago
Strategyproof Classification under Constant Hypotheses: A Tale of Two Functions
We consider the following setting: a decision maker must make a decision based on reported data points with binary labels. Subsets of data points are controlled by different selfi...
Reshef Meir, Ariel D. Procaccia, Jeffrey S. Rosens...
AAAI
2008
13 years 6 months ago
Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents
Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can ...
Saurabh Kataria, William Browuer, Prasenjit Mitra,...