Sciweavers

IPM
2008

Towards a unified approach to document similarity search using manifold-ranking of blocks

13 years 4 months ago
Towards a unified approach to document similarity search using manifold-ranking of blocks
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking sc...
Xiaojun Wan, Jianwu Yang, Jianguo Xiao
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2008
Where IPM
Authors Xiaojun Wan, Jianwu Yang, Jianguo Xiao
Comments (0)