The AncestorRank algorithm calculates an authority score by using just one characteristic of the web graph—the number of ancestors per node. For scalability, we estimate the num...
Web spam is behavior that attempts to deceive search engine ranking algorithms. TrustRank is a recent algorithm that can combat web spam. However, TrustRank is vulnerable in the s...
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches...
Both human users and crawlers face the problem of finding good start pages to explore some topic. We show how to assist in qualifying pages as start nodes by link-based ranking al...
Abstract. We propose a number of techniques for learning a global ranking from data that may be incomplete and imbalanced -- characteristics that are almost universal to modern dat...