Near-duplicate detection is not only an important pre and post processing task in Information Retrieval but also an effective spam-detection technique. Among different approache...
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
As the Internet is a global network, there is a demand on accessing closely related data without browsing through di erent Web documents. A signi cant amount of these data are pre...
The mediator/wrapper approach is used to integrate data from different databases and other data sources by introducing a middleware virtual database that provides high level abstr...
In this paper we present tools that provide an easy way to edit XML content directly on the web, with the usual benefit of valid XML content. These tools make it possible to crea...