The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
We connect two scenarios in structured learning: adapting a parser trained on one corpus to another annotation style, and projecting syntactic annotations from one language to ano...
Abstract. The Jetpack framework is Mozilla’s newly-introduced extension development technology. Motivated primarily by the need to improve how scriptable extensions (also called ...
DescribeX is a visual, interactive tool for exploring the underlying structure of an XML collection. DescribeX implements a framework for creating XML summaries described using axi...
Mir Sadek Ali, Mariano P. Consens, Shahan Khatchad...
This paper discusses generating document structure from annotated media repositories in a domain-independent manner. This approaches the vision of a universal RDF browser. We star...
Lloyd Rutledge, Jacco van Ossenbruggen, Lynda Hard...