Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

28

WWW
2007
ACM

favoriteEmaildiscussreport

98views Internet Technology» more WWW 2007»

A large-scale study of robots.txt

14 years 10 months ago

A large-scale study of robots.txt

Download www2007.org

Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such crawling activities can be regulated from the server side by deploying the Robots Exclusion Protocol in a file called robots.txt. Although it is not an enforcement standard, ethical robots (and many commercial) will follow the rules specified in robots.txt. With our focused crawler, we investigate 7,593 websites from education, government, news, and business domains. Five crawls have been conducted in succession to study the temporal changes. Through statistical analysis of the data, we present a survey of the usage of Web robots rules at the Web scale. The results also show that the usage of robots.txt has increased over time. General Terms Experimentation, Measurement. Keywords crawler, robots exclusion protocol, robots.txt, search engine.

Yang Sun, Ziming Zhuang, C. Lee Giles

Real-time Traffic

File Called Robots.txt | Internet Technology | Robots Exclusion Protocol | Web Robots Rules | WWW 2007 |

claim paper

Related Content

» Practical Large Scale WhatIf Queries Case Studies with Software Risk Assessment

» From small scale to large scale user participation a case study of participatory design in...

» A large scale study of wireless search behavior Google mobile search

» The difficulty of studying interorganisational IS phenomena on large scales critical refle...

» Parallel Changes in Large Scale Software Development An Observational Case Study

» A LargeScale Study of MySpace Observations and Implications for Online Social Networks

» SNPFile A software library and file format for large scale association mapping and popula...

» Web not for all a large scale study of web accessibility

» A Case Study in MetaSimulation Design and Performance Analysis for LargeScale Networks

» An Empirical Study on LargeScale ContentBased Image Retrieval

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2007
Where	WWW
Authors	Yang Sun, Ziming Zhuang, C. Lee Giles

Comments (0)