An unsupervised clustering of the webpages on a website is a primary requirement for most wrapper induction and automated data extraction methods. Since page content can vary dras...
We propose a novel method, called heterogeneous clustering ensemble (HCE), to generate robust clustering results that combine multiple partitions (clusters) derived from various cl...
Hye-Sung Yoon, Sang-Ho Lee, Sung-Bum Cho, Ju Han K...
Recent advances in technology have made tremendous amount of multimedia information available to the general population. An efficient way of dealing with this new development is t...
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining cluste...
Bhavana Bharat Dalvi, William W. Cohen, Jamie Call...
Data-intensive parallel applications on clouds need to deploy large data sets from the cloud's storage facility to all compute nodes as fast as possible. Many multicast algori...
Tatsuhiro Chiba, Mathijs den Burger, Thilo Kielman...