A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...
A wealth of information is available on the Web. But often, such data are hidden behind form interfaces which allow only a restrictive set of queries over the underlying databases...
Abstract: Recently a growing demand has arisen for methods for the development of smalland medium scale Web Information Systems (WIS). Web applications are being built in a rapidly...
The integration of data produced and collected across autonomous, heterogeneous web services is an increasingly important and challenging problem. Due to the lack of global identi...
Luis Gravano, Panagiotis G. Ipeirotis, Nick Koudas...
Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by us...