While complete understanding of arbitrary input text remains in the future, it is currently possible to construct natural language processing systems that provide a partial unders...
Peggy M. Andersen, Philip J. Hayes, Steven P. Wein...
In this paper, we propose a practical approach for extracting the most relevant paragraphs from the original document to form a summary for Thai text. The idea of our approach is ...
PhD students or researchers starting a new research project or initiating work in an unfamiliar research direction often undertake a scientific literature search in order to info...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents. The heuristics starts from an initial set of basic content elements an...