Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Abstract—While measures such as raw compute performance and system capacity continue to be important factors for evaluating cluster performance, such issues as system reliability...
William M. Jones, John T. Daly, Nathan DeBardelebe...
Metonymic location names refer to other, related entities and possess a meaning different from the literal, geographic sense. Metonymic names are to be treated differently to im...
In this paper, we consider the problem of identifying and segmenting topically cohesive regions in the URL tree of a large website. Each page of the website is assumed to have a t...
Thispaper presents a text word extraction algorithm that takes a set of bounding boxes of glyphs and their associated text lines of a given document andpartitions the glyphs into ...