In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
In this short note we present a recommendation system for automatic retrieval of broken Web links using an approach based on contextual information. We extract information from th...
Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However,...
Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge vo...
In the database community, work on information extraction (IE) has centered on two themes: how to effectively manage IE tasks, and how to manage the uncertainties that arise in th...
Daisy Zhe Wang, Michael J. Franklin, Minos N. Garo...