Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
The Internet and corporate intranets provide far more information than anybody can absorb. People use search engines to find the information they require. However, these systems t...
This paper describes the structure of, and the ideas behind, a self-applicable specializer of programs, as well as the principles of operation of a compiler generator that has been...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...
We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is a...