There exist two types of wrappers: the string based wrapper such as the LR wrapper, and the tree based wrapper. A tree based wrapper designates extraction regions by nodes on the ...
In a categorized information space, predicting users' information needs at the category level can facilitate personalization, caching and other topic-oriented services. This ...
On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus ...
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from structured documents, suc...
Raymond Kosala, Hendrik Blockeel, Maurice Bruynoog...
A semi-structured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because eac...