Sciweavers

1276 search results - page 158 / 256
» Vetting the links of the web
Sort
View
89
Voted
WWW
2003
ACM
15 years 11 months ago
DOM-based content extraction of HTML documents
Web pages often contain clutter (such as pop-up ads, unnecessary images and extraneous links) around the body of an article that distracts a user from actual content. Extraction o...
Suhit Gupta, Gail E. Kaiser, David Neistadt, Peter...
KDD
2009
ACM
185views Data Mining» more  KDD 2009»
15 years 10 months ago
On compressing social networks
Motivated by structural properties of the Web graph that support efficient data structures for in memory adjacency queries, we study the extent to which a large network can be com...
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, ...
88
Voted
SIGIR
2009
ACM
15 years 4 months ago
Building enriched document representations using aggregated anchor text
It is well known that anchor text plays a critical role in a variety of search tasks performed over hypertextual domains, including enterprise search, wiki search, and web search....
Donald Metzler, Jasmine Novak, Hang Cui, Srihari R...
75
Voted
VLDB
2004
ACM
113views Database» more  VLDB 2004»
15 years 3 months ago
Accurate and Efficient Crawling for Relevant Websites
Focused web crawlers have recently emerged as an alternative to the well-established web search engines. While the well-known focused crawlers retrieve relevant webpages, there ar...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber...
83
Voted
WECWIS
1999
IEEE
111views ECommerce» more  WECWIS 1999»
15 years 2 months ago
A Quantitative Analysis of the User Behavior of a Large E-Broker
The Internet and the World Wide Web provide a global virtual marketplace. However, there is little information about the behavior of e-commerce users worldwide. The goal of the pa...
Virgilio Almeida, Wagner Meira Jr., Victor F. Ribe...