Finding a set of web pages relevant to a user’s information goal is difficult due to the enormous size of the Internet. Search engines are able to find a set of pages that mat...
Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human experts can do a good job of identifying spam pages and pages wh...
The World Wide Web (WWW) is rapidly becoming important for society as a medium for sharing data, information and services, and there is a growing interest in tools for understandi...
Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
—“Big Data” in map-reduce (M-R) clusters is often fundamentally temporal in nature, as are many analytics tasks over such data. For instance, display advertising uses Behavio...
Badrish Chandramouli, Jonathan Goldstein, Songyun ...