Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this pape...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
Segmentation of document images remains a challenging vision problem. Although document images have a structured layout, capturing enough of it for segmentation can be difficult....
In this paper, we study the problem of automatically segmenting written text into paragraphs. This is inherently a sequence labeling problem, however, previous approaches ignore t...
Qinfeng Shi, Yasemin Altun, Alex J. Smola, S. V. N...
In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sour...
Jeroen Morang, Roeland Ordelman, Franciska de Jong...
TAGH is a system for automatic recognition of German word forms. It is based on a stem lexicon with allomorphs and a concatenative mechanism for inflection and word formation. Wei...