We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining cluste...
Bhavana Bharat Dalvi, William W. Cohen, Jamie Call...
Abstract. The use of spreadsheets to capture information is widespread in industry. Spreadsheets can thus be a wealthy source of domain information. We propose to automatically ext...
Felienne Hermans, Martin Pinzger, Arie van Deursen
Background: For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of thei...
Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting informatio...
In the Japanese language, as a predicate is placed at the end of a sentence, the content of a sentence cannot be inferred until reaching the end. However, when the content is comp...