Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human experts can do a good job of identifying spam pages and pages wh...
Although many algorithms have been developed to harvest lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a roo...
For people who use text-based web browsers, graphs, diagrams, and pictures are inaccessible. Yet, such diagrams are quite prominent in documents commonly found on the web. In this...
Kathleen F. McCoy, Sandra Carberry, Tom Roper, Nan...
To facilitate queries over semi-structured data, various structural summaries have been proposed. Structural summaries are derived directly from the data and serve as indices for ...