XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML...
Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini...
A collaborative crawler is a group of crawling nodes, in which each crawling node is responsible for a specific portion of the web. We study the problem of collecting geographical...
Through a variety of means, including a range of browser cache methods and inspecting the color of a visited hyperlink, client-side browser state can be exploited to track users a...
Collin Jackson, Andrew Bortz, Dan Boneh, John C. M...
While much of the data on the web is unstructured in nature, there is also a significant amount of embedded structured data, such as product information on e-commerce sites or sto...
The ccTLD (country code Top Level Domain) in a URL does not necessarily point to the geographic location of the server concerned. The authors have surveyed sample servers belongin...
Katsuko T. Nakahira, Tetsuya Hoshino, Yoshiki Mika...