Deduplication, a key operation in integrating data from multiple sources, is a time-consuming, labor-intensive and domainspecific operation. We present our design of alias that us...
We identify crucial design issues in building a distributed inverted index for a large collection of web pages. We introduce a novel pipelining technique for structuring the core ...
Ontologies are set to play a key role in the "Semantic Web", extending syntactic interoperability to semantic interoperability by providing a source of shared and precise...
Many emerging applications such as wide-area network management need to query large, structured, highly distributed datasets. Seaweed is a distributed scalable infrastructure for ...
Richard Mortier, Dushyanth Narayanan, Austin Donne...
Rational drug design is an example where integrated access to heterogeneous scientific data is urgently needed, as it becomes rapidly available due to new experimental and computa...