As user demands become increasingly sophisticated, search engines today are competing in more than just returning document results from the Web. One area of competition is providi...
Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to so...
We consider the problem of building a P2P-based search engine for massive document collections. We describe a prototype system called ODISSEA (Open DIStributed Search Engine Archi...
Knowledge constantly grows in scientific discourse and is revised over time by domain experts. The body of knowledge will get structured and refined as the Communities of Practice...
XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML...
Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini...