We investigate the problem of evaluating the performance of text processing algorithms on inputs that contain errors as a result of optical character recognition. A new hierarchic...
Text classification systems on biomedical literature aim to select relevant articles to a specific issue from large corpora. Most systems with an acceptable accuracy are based o...
This paper describes a method for definition question answering based on the use of surface text patterns. The method is specially suited to answer questions about person's po...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
The structure of a document has an important influence on the perception of its content. Considering scientific publications, we can affirm that by making use of the ordinary line...
Tudor Groza, Alexander Schutz, Siegfried Handschuh