In this paper, we propose a method of text retrieval from document images using a similarity measure based on an N-Gram algorithm. We directly extract image features instead of us...
Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
In this paper, we present a prototype that helps visualizing the relative importance of sentences extracted from medical texts using Embodied Conversational Agents (ECA). We propo...
Using language technology for text analysis and light-weight ontologies as a content-mediating level, we acquire indexing patterns from vast amounts of indexing data for Englishla...
This paper describes DTC (Documents, Transformations and Components), our approach to the XML-based development of content-intensive applications. According to this approach, the ...