We present a document understanding system in which the arrangement of lines of text and block separators within a document are modeled by stochastic context free grammars. A gram...
John C. Handley, Anoop M. Namboodiri, Richard Zani...
Performance evaluation for document image analysis and understanding is a recurring problem. Many groundtruthed document image databases are now used to evaluate general algorithm...
We argue that the quality of a summary can be evaluated based on how many concepts in the original document(s) that reserved after summarization. Here, a concept refers to an abst...
Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data...
Abstract. We present a model for complex documents possibly consisting of a hierarchically structured set of images or texts. Documents are represented both at the form level (as s...
Carlo Meghini, Fabrizio Sebastiani, Umberto Stracc...