To formulate a meaningful query on semistructured data, such as on the Web, that matches some of the source’s structure, we need first to discover something about how the infor...
Abstract A rich family of generic Information Extraction (IE) techniques have been developed by researchers nowadays. This paper proposes WebKER, a system for automatically extract...
Despite continuing advances in machine translation technology, users who lack familiarity with particular foreign languages have no good way to find information in those languages...
Boris Katz, Gary C. Borchardt, Sue Felshin, Yuan K...
The nature of semistructured data in web collections is evolving. Increasingly, XML web documents (or documents exchanged via web services) are valid with regard to a schema, yet ...
Mariano P. Consens, Flavio Rizzolo, Alejandro A. V...
A semi-structured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because eac...