We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
Digitizing legacy documents and marking them up with XML is important for many scientific domains. However, creating comprehensive semantic markup of high quality is challenging. ...
We present a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential ...
We have developed a program called fiwalk which produces detailed XML describing all of the partitions and files on a hard drive or disk image, as well as any extractable metadat...
XML is becoming the standard representation format for metadata. Metadata for multimedia documents, as for instance MPEG-7, require approximate match search functionalities to be s...