A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
Main approaches to corpus-based semantic class mining include distributional similarity (DS) and pattern-based (PB). In this paper, we perform an empirical comparison of them, bas...
This paper discusses how to augment the WWW with a Dexter-based hypermedia service that provides anchors, links and composites as objects stored external to the Web pages. The hyp...
In this paper, we propose to mine query hierarchies from clickthrough data, which is within the larger area of automatic acquisition of knowledge from the Web. When a user submits...
Dou Shen, Min Qin, Weizhu Chen, Qiang Yang, Zheng ...
In this paper, we propose a novel approach for automatic generation of visualizations from domain-specific data available on the web. We describe a general system pipeline that co...