Pagerank based clustering of hypertext document collections

15 years 5 months ago

Download www-sop.inria.fr

Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering. Categories and Subject Descriptors H.3 [Information Search and Retrieval]: Miscellaneous General Terms Algorithms, Experiments Keywords PageRank based Clustering, Directed Graphs

Konstantin Avrachenkov, Vladimir Dobrynin, Danil N

Real-time Traffic

Content Based Clustering | Information Technology | Most Clustering Methods | PRC Algorithm | SIGIR 2008 |

claim paper

» Generative semantic clustering in spatial hypertext

» As we may perceive inferring logical documents from hypertext

» Term Ranking for Clustering Web Search Results

» HyPursuit A Hierarchical Network Search Engine that Exploits ContentLink Hypertext Cluster...

» The Missing Link A Probabilistic Model of Document Content and Hypertext Connectivity

» Adaptive ranking of web pages

» Instructional information in adaptive spatial hypertext

» A random walk on the red carpet rating movies with user reviews and pagerank

Post Info
More Details (n/a)

Added	28 Dec 2010
Updated	28 Dec 2010
Type	Journal
Year	2008
Where	SIGIR
Authors	Konstantin Avrachenkov, Vladimir Dobrynin, Danil Nemirovsky, Son Kim Pham, Elena Smirnova

Comments (0)

Sciweavers

Pagerank based clustering of hypertext document collections

Content Based Clustering | Information Technology | Most Clustering Methods | PRC Algorithm | SIGIR 2008 |

Explore & Download

Productivity Tools

Sciweavers