Abstract. Topic models are a discrete analogue to principle component analysis and independent component analysis that model topic at the word level within a document. They have ma...
Federated text search provides a unified search interface for multiple search engines of distributed text information sources. Resource selection is an important component for fed...
We present a semi-Markov model for recognizing scene text that integrates character and word segmentation with recognition. Using wavelet features, it requires only approximate lo...
Allen R. Hanson, Erik G. Learned-Miller, Jerod J. ...
Extracting sentences that contain important information from a document is a form of text summarization. The technique is the key to the automatic generation of summaries similar ...
In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph....
Delia Rusu, Blaz Fortuna, Dunja Mladenic, Marko Gr...