The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neig...
A lexical signature (LS) is a small set of terms derived from a document that capture the "aboutness" of that document. A LS generated from a web page can be used to disc...
This paper studies structured data extraction from Web pages, e.g., online product description pages. Existing approaches to data extraction include wrapper induction and automatic...
In this note we consider a simple reformulation of the traditional power iteration algorithm for computing the stationary distribution of a Markov chain. Rather than communicate t...
A 4KB page size has been used for Virtual Memory since the sixties. In fact, today, the most common page size is still 4KB. Choosing a page size is finding the middle ground betwe...