This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the...
One of the criticisms library users often make of catalogs is that they rarely include information below the bibliographic level. It is generally impossible to search a catalog fo...
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For exampl...
In this paper I describe how a significant class of cases that would involve (possibly complex) structural transfer in nmchine translation can be handled avoiding transfer. This i...
A prerequisite for all higher level information extraction tasks is the identication of unknown names in text. Today, when large corpora can consist of billions of words, it is of...