Sciweavers

DOCENG
2005
ACM

Structuring documents according to their table of contents

13 years 6 months ago
Structuring documents according to their table of contents
In this paper, we present a method for structuring a document according to the information present in its Table of Contents. The detection of the ToC as well as the determination of the parts it refers to in the document body rely on a series of generic properties characterizing any ToC, while its hierarchization is achieved using clustering techniques. We also report on the robustness and performance of the method before discussing it, in light of related work. Categories and Subject Descriptors I.7.2 [Computing Methodologies]: Document and Text Processing - Document preparation Markup languages; I.7.4 [Computing Methodologies]: Document and Text Processing Electronic Publishing. I.7.5 [Computing Methodologies] Document Capture - Document analysis General Terms Algorithms, Documentation, Experimentation Keywords Document Structuring, Table of Contents recognition.
Hervé Déjean, Jean-Luc Meunier
Added 14 Oct 2010
Updated 14 Oct 2010
Type Conference
Year 2005
Where DOCENG
Authors Hervé Déjean, Jean-Luc Meunier
Comments (0)