Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, ...
Classifying and mining noise-free web pages will improve on accuracy of search results as well as search speed, and may benefit webpage organization applications (e.g., keyword-bas...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
We describe a method for improving the precision of metasearch results based upon scoring the visual features of documents' surrogate representations. These surrogate scores ...
Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, ...
In this paper, a novel framework is developed to support personalized news video recommendation. First, multi-modal information sources for news videos are seamlessly integrated an...
Hangzai Luo, Jianping Fan, Daniel A. Keim, Shin'ic...