The Internet makes it possible to share information (e.g. text, image, audio, video and other formats of data) across the globe. In this paper we look at collaborative Internet en...
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use ...
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hie...
We model on-line ink traces for a set of 219 symbols to “best fit” low-degree polynomial series. Using a collection of mathematical writing samples, we find that in many cas...