Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to...
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namel...
Abstract. This paper describes an investigation of data fusion techniques for spoken document retrieval. The effectiveness of retrievals solely based on the outputs from automatic...
Legal-RDF.org1 publishes a practical ontology that models both the layout and content of a document and metadata about the document; these have been built using data models implici...
Although detecting text lines in machine printed documents is typically considered a solved problem, it is still a challenge to segment handwritten text lines in the general sense...