Abstract. We show several PAC-style concentration bounds for learning unigrams language model. One interesting quantity is the probability of all words appearing exactly k times in...
Abstract. We consider the problem of estimating an unknown probability distribution from samples using the principle of maximum entropy (maxent). To alleviate overfitting with a v...
Abstract. Phylogenetics is a science of determining connections between groups of organisms in terms of ancestor/descendent relationships, usually expressed by phylogenetic trees, ...
Abstract. We present a system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias. The system first performs optical charact...
Abstract. Most of the research in data mining has been focused on developing novel algorithms for specific data mining tasks. However, finding the theoretical foundations of data...