More and more documents on the World Wide Web are based on templates. On a technical level this causes those documents to have a quite similar source code and DOM tree structure. G...
Online training and support objects can be difficult and timeconsuming to create. Written words are often ignored by users who want someone to show them a procedure. Viewlets are ...
In this paper, we show how we can learn to select good words for a document title. We view the problem of selecting good title words for a document as a variant of an Information ...
Abstract. In this paper, we describe a new unsupervised sentence boundary detection system and present a comparative study evaluating its performance against different systems foun...
Jan Strunk, Carlos Nascimento Silla Jr., Celso A. ...
A methodology using decoder banks is proposed for high-accuracy, fully automatic recognition of machine printed text across a wide range of challenging image qualities, without re...