Genealogical trees on the web: a search engine user perspective

14 years 5 months ago

Download www2008.org

This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Experimentation Keywords Web, text, content evolution, search engine, Web mining

Ricardo A. Baeza-Yates, Álvaro R. Pereira J

Real-time Traffic

Content Evolution | Internet Technology | Page Relevance Measures | Web Genealogical Tree | WWW 2008 |

claim paper

» Whats new on the web the evolution of the web from a search engine perspective

» How are we searching the World Wide Web A comparison of nine search engine transaction log...

» UserOriented Evaluation Methods for Interactive Web Search Interfaces

» The Effects of Query Bursts on Web Search

» Optimising Performance of Competing Search Engines in Heterogeneous Web Environments

» A CaseBased Perspective on Social Web Search

» Distributed Web Search as a Stochastic Game

» Usercentric content freshness metrics for search engines

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2008
Where	WWW
Authors	Ricardo A. Baeza-Yates, Álvaro R. Pereira Jr., Nivio Ziviani

Comments (0)

Sciweavers

Genealogical trees on the web: a search engine user perspective

Content Evolution | Internet Technology | Page Relevance Measures | Web Genealogical Tree | WWW 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers