Sciweavers

DASFAA
2008
IEEE
109views Database» more  DASFAA 2008»
13 years 11 months ago
Bulk-Loading the ND-Tree in Non-ordered Discrete Data Spaces
Applications demanding multidimensional index structures for performing efficient similarity queries often involve a large amount of data. The conventional tuple-loading approach t...
Hyun-Jeong Seok, Gang Qian, Qiang Zhu, Alexander R...
SEMWEB
2009
Springer
13 years 11 months ago
Functions over RDF Language Elements
Spreadsheet tools are often used in business and private scenarios in order to collect and store data, and to explore and analyze these data by executing functions and aggregation...
Bernhard Schandl
PODS
2007
ACM
139views Database» more  PODS 2007»
14 years 4 months ago
Management of probabilistic data: foundations and challenges
Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as pro...
Nilesh N. Dalvi, Dan Suciu
SIGMOD
2001
ACM
193views Database» more  SIGMOD 2001»
14 years 4 months ago
Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data
The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The s...
Christian Böhm, Bernhard Braunmüller, Fl...
RECOMB
2004
Springer
14 years 4 months ago
Computational identification of evolutionarily conserved exons
Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multispecies version of the ab initio gene prediction problem. These models ...
Adam C. Siepel, David Haussler
CHI
2003
ACM
14 years 4 months ago
Efficient user interest estimation in fisheye views
We present a new technique for efficiently computing Degree-of-Interest distributions to inform the visualization of graph-structured data. The technique is independent of the int...
Jeffrey Heer, Stuart K. Card
KDD
2003
ACM
180views Data Mining» more  KDD 2003»
14 years 5 months ago
Classifying large data sets using SVMs with hierarchical clusters
Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convey several salient ...
Hwanjo Yu, Jiong Yang, Jiawei Han
KDD
2007
ACM
335views Data Mining» more  KDD 2007»
14 years 5 months ago
Detecting changes in large data sets of payment card data: a case study
An important problem in data mining is detecting changes in large data sets. Although there are a variety of change detection algorithms that have been developed, in practice it c...
Chris Curry, Robert L. Grossman, David Locke, Stev...
ICML
2005
IEEE
14 years 5 months ago
Intrinsic dimensionality estimation of submanifolds in Rd
We present a new method to estimate the intrinsic dimensionality of a submanifold M in Rd from random samples. The method is based on the convergence rates of a certain U-statisti...
Matthias Hein, Jean-Yves Audibert
ICML
2008
IEEE
14 years 5 months ago
Fully distributed EM for very large datasets
In EM and related algorithms, E-step computations distribute easily, because data items are independent given parameters. For very large data sets, however, even storing all of th...
Jason Wolfe, Aria Haghighi, Dan Klein