A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex langu...
—We present an intelligent agent crawler designed to collect user-generated content in Second Life and related virtual worlds. The agents navigate autonomously through the world ...
Previous anti-spamming algorithms based on link structure suffer from either the weakness of the page value metric or the vagueness of the seed selection. In this paper, we propos...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
We propose a novel collaborative recommendation approach to take advantage of the information available in user-created lists. Our approach assumes associations among any two item...