We present an elegant and extensible model that is capable of providing semantic interpretations for an unusually wide range of textual tables in documents. Unlike the few existin...
The Online Database of Interlinear Text (ODIN)1 is a database of interlinear text "snippets", harvested mostly from scholarly documents posted to the Web. Although large...
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
ost abstract sense, we build web pages so that computers can read them. The software that people use to access web pages is what "reads" the document. How the page is ren...
Originally XML was used as a standard protocol for data exchange in computing. The evolution of information technology has opened up new situations in which XML can be used to aut...