Sciweavers

1950 search results - page 115 / 390
» Informative sampling for large unbalanced data sets
Sort
View
123
Voted
ISBRA
2007
Springer
15 years 9 months ago
Clustering Algorithms Optimizer: A Framework for Large Datasets
Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are r...
Roy Varshavsky, David Horn, Michal Linial
SIGMOD
2009
ACM
125views Database» more  SIGMOD 2009»
16 years 3 months ago
Top-k queries on uncertain data: on score distribution and typical answers
Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important...
Tingjian Ge, Stanley B. Zdonik, Samuel Madden
122
Voted
CORR
2000
Springer
116views Education» more  CORR 2000»
15 years 3 months ago
Algorithmic Statistics
While Kolmogorov complexity is the accepted absolute measure of information content of an individual finite object, a similarly absolute notion is needed for the relation between a...
Péter Gács, John Tromp, Paul M. B. V...
151
Voted
TKDE
2002
123views more  TKDE 2002»
15 years 3 months ago
Coordinated Placement and Replacement for Large-Scale Distributed Caches
In a large-scale information system such as a digital library or the web, a set of distributed caches can improve their effectiveness by coordinating their data placement decisions...
Madhukar R. Korupolu, Michael Dahlin
149
Voted
COLING
2010
14 years 10 months ago
Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization
Cross Document Coreference (CDC) is the task of constructing the coreference chain for mentions of a person across a set of documents. This work offers a holistic view of using do...
Jian Huang 0002, Pucktada Treeratpituk, Sarah M. T...