Sciweavers

ICDAR
1995
IEEE

Ground-truthing and benchmarking document page segmentation

13 years 7 months ago
Ground-truthing and benchmarking document page segmentation
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: the segmentation output, described as a set of regions together with their types, output order etc., is matched against the pre-stored set of ground-truth regions. Misclassifications, splitting, and merging of regions are among the errors that are detected by the system. Each error is weighted individually for a particular application and a global estimate of segmentation quality is derived. The system can be customized to benchmark specific aspects of segmentation (e.g., headline detection) and according to the type of error correction that might follow (e.g., re-typing). Segmentation ground-truth files are quickly and easily generated and edited using GroundsKeeper, an X-Window based tool that allows one to view a document, manually draw regions (arbitrary polygons) on it, and specify information about each region (e.g., type, parent).
Berrin A. Yanikoglu, Luc Vincent
Added 26 Aug 2010
Updated 26 Aug 2010
Type Conference
Year 1995
Where ICDAR
Authors Berrin A. Yanikoglu, Luc Vincent
Comments (0)