Determining Bias to Search Engines from Robots.txt

15 years 5 months ago

Download searchengineland.com

Search engines largely rely on robots (i.e., crawlers or spiders) to collect information from the Web. Such crawling activities can be regulated from the server side by deploying the Robots Exclusion Protocol in a ﬁle called robots.txt. Ethical robots will follow the rules speciﬁed in robots.txt. Websites can explicitly specify an access preference for each robot by name. Such biases may lead to a “rich get richer” situation, in which a few popular search engines ultimately dominate the Web because they have preferred access to resources that are inaccessible to others. This issue is seldom addressed, although the robots.txt convention has become a de facto standard for robot regulation and search engines have become an indispensable tool for information access. We propose a metric to evaluate the degree of bias to which speciﬁc robots are subjected. We have investigated 7,593 websites covering education, government, news, and business domains, and collected 2,925 distinct r...

Yang Sun, Ziming Zhuang, Isaac G. Councill, C. Lee

Real-time Traffic

Internet Technology | Popular Search Engines | Robots | Search Engine | WEBI 2007 |

claim paper

Related Content

» Domain bias in web search

» Random sampling from a search engines index

» Overlap Among Major Web Search Engines

» Determining the user intent of web search engine queries

» Efficient search engine measurements

» Beyond position bias examining result attractiveness as a source of presentation bias in c...

» Query biased snippet generation in XML search

» Extracting Link Spam using Biased Random Walks from Spam Seed Sets

» A user browsing model to predict search engine click data from past observations

Post Info
More Details (n/a)

Added	09 Jun 2010
Updated	09 Jun 2010
Type	Conference
Year	2007
Where	WEBI
Authors	Yang Sun, Ziming Zhuang, Isaac G. Councill, C. Lee Giles

Comments (0)

Sciweavers

Determining Bias to Search Engines from Robots.txt

Internet Technology | Popular Search Engines | Robots | Search Engine | WEBI 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers