Automated extraction of bibliographic information from journal articles is key to the affordable creation and maintenance of citation databases, such as MEDLINE
Xiaoli Zhang, Jie Zou, Daniel X. Le, George R. Tho...
Many visual analysis tools operate on a fixed set of data. However, professional information analysts follow issues over a period of time and need to be able to easily add new doc...
Elizabeth G. Hetzler, Vernon L. Crow, Deborah A. P...
We are developing a recognition system, named `Infty', for scientific documents including those with mathematical formulae. In this paper, we propose a new system that can re...
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
A new hybrid page layout analysis algorithm is proposed, which uses bottom-up methods to form an initial data-type hypothesis and locate the tab-stops that were used when the page...