Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. H...
Searching and extracting meaningful information out of highly heterogeneous datasets is a hot topic that received a lot of attention. However, the existing solutions are based on e...
Federation of Abstracting and Information Services presentation (“TheThomsonTransformation: Remaking a Global 500 Company,” http://www. nfais.org/TurnerNFAIS06.ppt). Now conten...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Abstract. Thanks to the recent explosive progress of WWW (WorldWide Web), we can easily access a large number of images from WWW. There are, however, no established methods to make...