With an increasing number of people that read, write and comment on blogs, the blogosphere has established itself as an essential medium of communication. A fundamental characteri...
In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages...
The WWW is a most popular service on the Internet and a huge number of WWW information sources are available. Conventionally we access WWW information sources one by one by using a...
Although we see the positive results of information retrieval research embodied throughout the Internet, on our computer desktops, and in many other aspects of daily life, at the ...
We consider the problem of building a P2P-based search engine for massive document collections. We describe a prototype system called ODISSEA (Open DIStributed Search Engine Archi...