Background: The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of ...
Sunghwan Sohn, Donald C. Comeau, Won Kim, W. John ...
In many contexts, one is confronted with the problem of extracting information from large amounts of different types soft data (e.g., text) and hard data (from e.g., physics-based...
Thanuka Wickramarathne, Kamal Premaratne, Manohar ...
This paper describes a new versatile algorithm for correcting nonlinear distortions, such as curvature of book pages, in camera based document processing. We introduce the idea of...
Spam e-mail with advertisement text embedded in images presents a great challenge to anti-spam filters. In this paper, we present a fast method to detect image-based spam e-mail. U...
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recogniti...