Sciweavers

363 search results - page 2 / 73
» Probabilistic Data Generation for Deduplication and Data Lin...
Sort
View
KDD
2008
ACM
156views Data Mining» more  KDD 2008»
14 years 4 months ago
Unsupervised deduplication using cross-field dependencies
Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes col...
Robert Hall, Charles A. Sutton, Andrew McCallum
SIGIR
2008
ACM
13 years 3 months ago
Named entity normalization in user generated content
Named entity recognition is important for semantically oriented retrieval tasks, such as question answering, entity retrieval, biomedical retrieval, trend detection, and event and...
Valentin Jijkoun, Mahboob Alam Khalid, Maarten Mar...
BMCBI
2011
12 years 7 months ago
A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-w
Background: Discovering the genetic basis of common genetic diseases in the human genome represents a public health issue. However, the dimensionality of the genetic data (up to 1...
Raphael Mourad, Christine Sinoquet, Philippe Leray
ICS
2009
Tsinghua U.
13 years 1 months ago
R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems
Data de-duplication has become a commodity component in dataintensive systems and it is required that these systems provide high reliability comparable to others. Unfortunately, b...
Chuanyi Liu, Yu Gu, Linchun Sun, Bin Yan, Dongshen...
BMCBI
2008
117views more  BMCBI 2008»
13 years 3 months ago
A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge
Background: The functional characterization of newly discovered proteins has been a challenge in the post-genomic era. Protein-protein interactions provide insights into the funct...
Young-Rae Cho, Lei Shi, Murali Ramanathan, Aidong ...