—For historical documents, available transcriptions typically are inaccurate when compared with the scanned document images. Not only the position of the words and sentences are ...
We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to ...
Metadocuments are documents that consist primarily of references to other documents, and elements within them. Our active browsing web visualization tool generates an evolving ser...
This paper proposes an approach to the problem of generating metadata for composite mixed-media digital objects by appropriately combining and exploiting existing knowledge or met...
We present a mixture model based approach for learning individualized behavior models for the Web users. We investigate the use of maximum entropy and Markov mixture models for ge...