Sciweavers

223 search results - page 23 / 45
» Generalized Descriptor Compression for Storage and Matching
Sort
View
SIGMOD
2010
ACM
362views Database» more  SIGMOD 2010»
14 years 4 months ago
Data warehousing and analytics infrastructure at facebook
Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and nonengineering. Apart from ad hoc analysis of data and ...
Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba ...
WWW
2008
ACM
15 years 10 months ago
Recrawl scheduling based on information longevity
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Christopher Olston, Sandeep Pandey
108
Voted
WWW
2005
ACM
15 years 10 months ago
Thresher: automating the unwrapping of semantic content from the World Wide Web
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
Andrew Hogue, David R. Karger
WWW
2005
ACM
15 years 10 months ago
The volume and evolution of web page templates
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...
David Gibson, Kunal Punera, Andrew Tomkins
69
Voted
SIGIR
2004
ACM
15 years 3 months ago
Evaluation of filtering current news search results
We describe an evaluation of result set filtering techniques for providing ultra-high precision in the task of presenting related news for general web queries. In this task, the n...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury...