A substantial subset of the web data follows some kind of underlying structure. Nevertheless, HTML does not contain any schema or semantic information about the data it represents...
Business graphics are an important class of digital imagery. Such images are computer-generated, and comprise synthetic elements such as solid fills, line art, and color sweeps. O...
Salil Prabhakar, Hui Cheng, Raja Bala, John C. Han...
Named Entity recognition, as a task of providing important semantic information, is a critical first step in Information Extraction and QuestionAnswering system. This paper propos...
As the result of interactions between visitors and a web site, an http log file contains very rich knowledge about users on-site behaviors, which, if fully exploited, can better c...
Information available in the Internet is frequently supplied simply as plain ascii text, structured according to orthographic and semantic conventions. Traditional document classi...