Detecting the origin of text segments efficiently

16 years 4 months ago

Download www2009.eprints.org

In the origin detection problem an algorithm is given a set S of documents, ordered by creation time, and a query document D. It needs to output for every consecutive sequence of k alphanumeric terms in D the earliest document in S in which the sequence appeared (if such a document exists). Algorithms for the origin detection problem can, for example, be used to detect the "origin" of text segments in D and thus to detect novel content in D. They can also find the document from which the author of D has copied the most (or show that D is mostly original.) We propose novel algorithms for this problem and evaluate them together with a large number of previously published algorithms. Our results show that (1) detecting the origin of text segments efficiently can be done with very high accuracy even when the space used is less than 1% of the size of the documents in S, (2) the precision degrades smoothly with the amount of available space, (3) various estimation techniques can b...

Ossama Abdel Hamid, Behshad Behzadi, Stefan Christ

Real-time Traffic

General Terms Algorithms | Internet Technology | Keywords Document Overlap | Origin Detection Problem | WWW 2009 |

claim paper

» An Efficient Edge Based Technique for Text Detection in Video Frames

» Segmentation Based Recovery of Arbitrarily Warped Document Images

» Video text detection based on filters and edge features

» Text Line Segmentation Based on Morphology and Histogram Projection

» An Efficient Word Segmentation Technique for Historical and Degraded MachinePrinted Docume...

» A New Method for Handwritten Scene Text Detection in Video

» Robust Extraction of Text in Video

» Evaluating SEE A Benchmarking System for Document Page Segmentation

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2009
Where	WWW
Authors	Ossama Abdel Hamid, Behshad Behzadi, Stefan Christoph, Monika Rauch Henzinger

Comments (0)

Sciweavers

Detecting the origin of text segments efficiently

General Terms Algorithms | Internet Technology | Keywords Document Overlap | Origin Detection Problem | WWW 2009 |

Explore & Download

Productivity Tools

Sciweavers