The Web is a very large social network. It is important and interesting to understand the “ecology” of the Web: the general relations of Web pages to their environment. The un...
This paper proposes PlusDBG to improve both precision and pseudo-recall by extending the conventional Web community extraction scheme. Precision is defined as the percentage of re...
The rapid development of network technologies has made the web a huge information source with its own characteristics. In most cases, traditional database-based technologies are no...
The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be...
David Gibson, Jon M. Kleinberg, Prabhakar Raghavan
Syntactically different URLs could represent the same web page on the World Wide Web, and duplicate representation for web pages causes web applications to handle a large amount of...