Sciweavers

ICDAR
2011
IEEE

Math Spotting: Retrieving Math in Technical Documents Using Handwritten Query Images

12 years 4 months ago
Math Spotting: Retrieving Math in Technical Documents Using Handwritten Query Images
—A method for locating mathematical expressions in document images without the use of optical character recognition is presented. An index of document regions is produced from recursive X-Y trees produced for each page in the corpus. Queries are provided as images of handwritten expressions, for which an X-Y tree is computed. During retrieval, the query is looked up in the document region index using features of its XY tree, producing a set of candidate regions. Candidate regions are ranked by the similarity of vertical pixel projections in their upper and lower halves with those of the query image, as computed using Dynamic Time Warping of the image columns. In an experiment, ten participants each wrote twenty queries from a 200-page corpus. On average, the top-10 retrieval candidates included a candidate covering 43.3% of the test query image (σ = 14.0), with the correct page being returned between 30.0% and 85.0% of the time across participants (µ = 63.2%, σ = 14.9%). When test...
Richard Zanibbi, Li Yu
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Where ICDAR
Authors Richard Zanibbi, Li Yu
Comments (0)