: An OCR free word spotting method is developed and evaluated under a strong experimental protocol. Different feature sets are evaluated under the same experimental conditions. In ...
Israel Rios, Alceu de Souza Britto Jr., Alessandro...
: In this paper we introduce eXVisXML, a visual tool to explore documents annotated with the mark-up language XML, in order to easily perform over them tasks as knowledge extractio...
Document keywords are associated to documents as summarized versions of the documents' content. Considering that the number of documents is quickly growing every day, the ava...
The latent topic model plays an important role in the unsupervised learning from a corpus, which provides a probabilistic interpretation of the corpus in terms of the latent topic...
In this paper, we propose a preference framework for information retrieval in which the user and the system administrator are enabled to express preference annotations on search ke...
Intuitively, any `bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies ...
Eduard Hoenkamp, Peter Bruza, Dawei Song, Qiang Hu...
This paper presents a tool and a novel Fast Invariant Transform (FIT) algorithm for language independent e-documents access. The tool enables a person to access an e-document thro...
Qiong Liu, Hironori Yano, Don Kimber, Chunyuan Lia...
Abstract. Standard Support Vector Machines (SVM) text classification relies on bag-of-words kernel to express the similarity between documents. We show that a document lattice can ...
Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifie...
In this paper we propose the multirelational topic model (MRTM) for multiple types of link modeling such as citation and coauthor links in document networks. In the citation networ...
Jia Zeng, William K. Cheung, Chun-hung Li, Jiming ...