Sciweavers

611 search results - page 54 / 123
» Random web crawls
Sort
View
PVLDB
2010
161views more  PVLDB 2010»
14 years 10 months ago
Annotating and Searching Web Tables Using Entities, Types and Relationships
Tables are a universal idiom to present relational data. Billions of tables on Web pages express entity references, attributes and relationships. This representation of relational...
Girija Limaye, Sunita Sarawagi, Soumen Chakrabarti
WWW
2011
ACM
14 years 6 months ago
Prophiler: a fast filter for the large-scale detection of malicious web pages
Malicious web pages that host drive-by-download exploits have become a popular means for compromising hosts on the Internet and, subsequently, for creating large-scale botnets. In...
Davide Canali, Marco Cova, Giovanni Vigna, Christo...
CIDR
2009
129views Algorithms» more  CIDR 2009»
15 years 29 days ago
Extracting and Querying a Comprehensive Web Database
Recent research in domain-independent information extraction holds the promise of an automatically-constructed structured database derived from the Web. A query system based on th...
Michael J. Cafarella
SIGIR
2008
ACM
14 years 11 months ago
SpotSigs: robust and efficient near duplicate detection in large web collections
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...
WWW
2004
ACM
16 years 16 days ago
Combining link and content analysis to estimate semantic similarity
Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...
Filippo Menczer