Sciweavers

COLING
2000

A Formalism for Universal Segmentation of Text

13 years 5 months ago
A Formalism for Universal Segmentation of Text
Sumo is a formalism for universal segmentation of text. Its purpose is to provide a framework for the creation of segmentation applications. It is called universal as the formalism itself is independent of the language of the documents to process and independent of the levels of segmentation e.g. words, sentences, paragraphs, morphemes... considered by the target application. This framework relies on a layered structure representing the possible segmentations of the document. This structure and the tools to manipulate it are described, followed by detailed examples highlighting some features of Sumo.
Julien Quint
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Where COLING
Authors Julien Quint
Comments (0)