Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors’ s...
Previous studies have highlighted the high arrival rate of new content on the web. We study the extent to which this new content can be efficiently discovered by a crawler. Our st...
Anirban Dasgupta, Arpita Ghosh, Ravi Kumar, Christ...
The ImpressionRank of a web page (or, more generally, of a web site) is the number of times users viewed the page while browsing search results. ImpressionRank captures the visibi...
Lexical chaining is a technique for identifying semanticallyrelated terms in text. We propose concept chaining to link semantically-related concepts within biomedical text togethe...
Abstract. The automatic detection of shared content in written documents –which includes text reuse and its unacknowledged commitment, plagiarism– has become an important probl...