In previous papers, we have presented a logic-based framework based on fusion rules for merging structured news reports [Hun00, Hun02b, Hun02a, HS03, HS04]. Structured news report...
Due to the great variation of biological names in biomedical text, appropriate tokenization is an important preprocessing step for biomedical information retrieval. Despite its im...
Recently direct optimization of information retrieval (IR) measures becomes a new trend in learning to rank. Several methods have been proposed and the effectiveness of them has ...
Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For exampl...
This paper presents a quantitative performance analysis of two different approaches to the lemmatization of the Czech text data. The first one is based on manually prepared diction...