We present the problem of categorizing web services according to a shallow ontology for presentation on a specialist portal, using their WSDL and associated textual documents foun...
A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web si...
Craig A. Knoblock, Kristina Lerman, Steven Minton,...
Extracting titles from a PDFs full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating...
In this paper, we present a robust feature extraction framework based on informationtheoretic learning. Its formulated objective aims at simultaneously maximizing the Renyi's...
Customization to specific domains of discourse and/or user requirements is one of the greatest challenges for today’s Information Extraction (IE) systems. While demonstrably eff...