Distinguishing Mathematics Notation from English Text using Computational Geometry

13 years 10 months ago

Download www.cse.lehigh.edu

A trainable method for distinguishing between mathematics notation and natural language (here, English) in images of textlines, using computational geometry methods only with no assistance from symbol recognition, is described. The input to our method is a “neighbor graph” extracted from a bilevel image of an isolated textline by the method of Kise [8]: this is a pruned form of Delaunay triangulation of the set of locations of black connected components. Our method ﬁrst attempts to classify each vertex and, separately, each edge of the neighbor graph as belonging to math or English; then these results are combined to yield a classiﬁcation of the entire textline. All three classiﬁers are automatically trainable. Features for the vertex and edge classiﬁers were selected semi-manually from a large number in a process driven by training data: this stage is potentially fully automatable. In experiments on images scanned from books and images generated synthetically, this method...

Derek M. Drake, Henry S. Baird

Real-time Traffic