In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the...
We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we d...
Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources c...
David E. Simmen, Frederick Reiss, Yunyao Li, Sures...
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
The feature selection and weighting are two important parts of automatic text classification. In this paper we give a new method based on concept attributes. We use the DEF Terms o...