This paper presents two methods which automatically produce annotated corpora for text summarisation on the basis of human abstracts. Both methods identify a set of sentences from ...
This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
Abstract. The method of lexical chains is the first time introduced to generate summaries from Chinese texts. The algorithm which computes lexical chains based on the HowNet knowl...
We propose methods to classify lines of military chat, or posts, which contain items of interest. We evaluated several current text categorization and feature selection methodologi...
Digitizing ancient books, especially those related to the humanities, is practiced in many countries. The number of full-text databases in the humanities is increasing. Studies hav...