Data Selection has emerged as a common issue in language technologies. We define Data Selection as the choosing of a subset of training data that is most effective for a given tas...
Jonathan Clark, Robert E. Frederking, Lori S. Levi...
We investigate the following data mining problem from Computational Chemistry: From a large data set of compounds, find those that bind to a target molecule in as few iterations o...
Solving stochastic optimization problems under partial observability, where one needs to adaptively make decisions with uncertain outcomes, is a fundamental but notoriously diffic...
Abstract. Query-by-example and query-by-keyword both suffer from the problem of "aliasing," meaning that example-images and keywords potentially have variable interpretat...
Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domainspecific operation. We present our design of alias that us...