Sciweavers

DOCENG
2005
ACM

Injecting information into atomic units of text

13 years 6 months ago
Injecting information into atomic units of text
This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common data structure, together with an arbitrary number of user-defined properties. In the first part, we give a survey of the notions of character and glyph and their relation with Natural Language Processing models, some visual text representation issues and strategies adopted by file formats (SVG, PDF, DVI) and software (Uniscribe, Pango). In the second part we show applications of textemes in various text processing issues: ligatures, variant glyphs and other OpenType-related properties, hyphenation, color and other presentation attributes, Arabic form and morphology, CJK spacing, metadata, etc. Finally we describe how the Omega typesetting system implements texteme processing as an example of a generalised approach to input character stream parsing, internal representation of text, and modular typographic t...
Yannis Haralambous, Gábor Bella
Added 14 Oct 2010
Updated 14 Oct 2010
Type Conference
Year 2005
Where DOCENG
Authors Yannis Haralambous, Gábor Bella
Comments (0)