Sciweavers

ERCIMDL
2008
Springer
107views Education» more  ERCIMDL 2008»
13 years 6 months ago
Revisiting Lexical Signatures to (Re-)Discover Web Pages
A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to disc...
Martin Klein, Michael L. Nelson
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
13 years 11 months ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...