Sciweavers

2553 search results - page 277 / 511
» How-To Web Pages
Sort
View
123
Voted
NSDI
2010
15 years 4 months ago
The Architecture and Implementation of an Extensible Web Crawler
Many Web services operate their own Web crawlers to discover data of interest, despite the fact that largescale, timely crawling is complex, operationally intensive, and expensive...
Jonathan M. Hsieh, Steven D. Gribble, Henry M. Lev...
WWW
2008
ACM
16 years 4 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev
WWW
2006
ACM
16 years 4 months ago
Topical TrustRank: using topicality to combat web spam
Web spam is behavior that attempts to deceive search engine ranking algorithms. TrustRank is a recent algorithm that can combat web spam. However, TrustRank is vulnerable in the s...
Baoning Wu, Vinay Goel, Brian D. Davison
122
Voted
CICLING
2009
Springer
15 years 7 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus
99
Voted
WWW
2006
ACM
16 years 4 months ago
Examining the content and privacy of web browsing incidental information
This research examines the privacy comfort levels of participants if others can view traces of their web browsing activity. During a week-long field study, participants used an el...
Kirstie Hawkey, Kori M. Inkpen