Sciweavers

WSDM
2009
ACM

Finding text reuse on the web

13 years 11 months ago
Finding text reuse on the web
With the overwhelming number of reports on similar events originating from different sources on the web, it is often hard, using existing web search paradigms, to find the original source of “facts”, statements, rumors, and opinions, and to track their development. Several techniques have been previously proposed for detecting such text reuse between different sources, however these techniques have been tested against relatively small and homogeneous TREC collections. In this work, we test the feasibility of text reuse detection techniques in the setting of web search. In addition to text reuse detection, we develop a novel technique that addresses the unique challenges of finding original sources on the web, such as defining a timeline. We also explore the use of link analysis for identifying reliable and relevant reports. Our experimental results show that the proposed techniques can operate on the scale of the web, are significantly more accurate than standard web search ...
Michael Bendersky, W. Bruce Croft
Added 19 May 2010
Updated 19 May 2010
Type Conference
Year 2009
Where WSDM
Authors Michael Bendersky, W. Bruce Croft
Comments (0)