Author identification is a text categorization task with applications in intelligence, criminal law, computer forensics, etc. Usually, in such cases there is shortage of training t...
A new distance measure between probability density functions (pdfs) is introduced, which we refer to as the Laplacian pdf distance. The Laplacian pdf distance exhibits a remarkabl...
Using Dirac Notation as a powerful tool, we investigate the three classical Information Retrieval (IR) models and some their extensions. We show that almost all such models can be...
Abstract. The protein family classification problem, which consists of determining the family memberships of given unknown protein sequences, is very important for a biologist for ...
We examine the set covering machine when it uses data-dependent half-spaces for its set of features and bound its generalization error in terms of the number of training errors an...
Mario Marchand, Mohak Shah, John Shawe-Taylor, Mar...