Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML w...
The Web is becoming a universal information dissemination medium, due to a number of factors including its support for content dynamicity. A growing number of Web information prov...
Sandeep Pandey, Kedar Dhamdhere, Christopher Olsto...
In this paper we propose a new knowledge management task which aims to map Web pages to their corresponding records in a structured database. For example, the DBLP database contai...
Tim Weninger, Fabio Fumarola, Jiawei Han, Donato M...
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...