Sciweavers

VLUDS
2010
184views Visualization» more  VLUDS 2010»
12 years 11 months ago
Advanced Visualization and Interaction Techniques for Large High-Resolution Displays
Large high-resolution displays combine the images of multiple smaller display devices to form one large display area. A total resolution that can easily comprise several hundred m...
Sebastian Thelen
PVLDB
2010
195views more  PVLDB 2010»
12 years 11 months ago
Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints
A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has ...
Jiannan Wang, Guoliang Li, Jianhua Feng
PROMISE
2010
12 years 11 months ago
Replication of defect prediction studies: problems, pitfalls and recommendations
Background: The main goal of the PROMISE repository is to enable reproducible, and thus verifiable or refutable research. Over time, plenty of data sets became available, especial...
Thilo Mende
JMLR
2010
161views more  JMLR 2010»
12 years 11 months ago
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM
Kernel techniques have long been used in SVM to handle linearly inseparable problems by transforming data to a high dimensional space, but training and testing large data sets is ...
Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Micha...
JMLR
2010
182views more  JMLR 2010»
12 years 11 months ago
Quadratic Programming Feature Selection
Identifying a subset of features that preserves classification accuracy is a problem of growing importance, because of the increasing size and dimensionality of real-world data se...
Irene Rodriguez-Lujan, Ramón Huerta, Charle...
JMLR
2010
156views more  JMLR 2010»
12 years 11 months ago
Classification with Incomplete Data Using Dirichlet Process Priors
A non-parametric hierarchical Bayesian framework is developed for designing a classifier, based on a mixture of simple (linear) classifiers. Each simple classifier is termed a loc...
Chunping Wang, Xuejun Liao, Lawrence Carin, David ...
WWW
2011
ACM
12 years 11 months ago
Parallel boosted regression trees for web search ranking
Gradient Boosted Regression Trees (GBRT) are the current state-of-the-art learning paradigm for machine learned websearch ranking — a domain notorious for very large data sets. ...
Stephen Tyree, Kilian Q. Weinberger, Kunal Agrawal...
NAR
2011
241views Computer Vision» more  NAR 2011»
12 years 11 months ago
PRIDB: a protein-RNA interface database
The Protein–RNA Interface Database (PRIDB) is a comprehensive database of protein–RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to fac...
Benjamin A. Lewis, Rasna R. Walia, Michael Terribi...
BMCBI
2011
12 years 11 months ago
Learning genetic epistasis using Bayesian network scoring criteria
Background: Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have be...
Xia Jiang, Richard E. Neapolitan, M. Michael Barma...
DEBU
2010
138views more  DEBU 2010»
13 years 1 months ago
A Rule-Based Citation System for Structured and Evolving Datasets
We consider the requirements that a citation system must fulfill in order to cite structured and evolving data sets. Such a system must take into account variable granularity, con...
Peter Buneman, Gianmaria Silvello