A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumb...
From pre-schools to high schools, at home and in museums, the educational community has embraced the use of computers as a teaching tool. Yet many institutions will simply install...
Lori L. Scarlatos, Amy Bruckman, Allison Druin, Mi...
Abstract. Classical probability theory considers probability distributions that assign probabilities to all events (at least in the finite case). However, there are natural situat...
Alexey V. Chernov, Alexander Shen, Nikolai K. Vere...