Machine-generated documents containing semi-structured text are rapidly forming the bulk of data being stored in an organisation. Given a feature-based representation of such data,...
Since 1995, a few statistical parsing algorithms have demonstrated a breakthrough in parsing accuracy, as measured against the UPenn TREEBANK as a gold standard. In this paper we ...
Scott Miller, Heidi Fox, Lance A. Ramshaw, Ralph M...
Text-Mining is a growing area of interest within the field of Data Mining and Knowledge Discovery. Given a collection of text documents, most approaches to Text Mining perform kno...
In this paper, we present a new approach to extracting the target text line from a document image captured by a pen scanner. Given the binary image, a set of possible text lines a...