Abstract. PADS is a declarative language used to describe the syntax and semantic properties of ad hoc data sources such as financial transactions, server logs and scientific data ...
Qian Xi, Kathleen Fisher, David Walker, Kenny Qili...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Web search engines present search results in a rank ordered list. This works when what a user wants is near the top, but sometimes the information that the user really wants is lo...
The database tier of dynamic content servers at large Internet sites is typically hosted on centralized and expensive hardware. Recently, research prototypes have proposed using d...
We show how to efficiently evaluate generic map-filter-product queries, generalizations of select-project-join (SPJ) queries in relational algebra, based on a combination of two...