Frequent Itemset Mining for Clustering Near Duplicate Web Documents

15 years 10 months ago

Download www.mendeley.com

A vast amount of documents in the Web have duplicates, which is a challenge for developing eﬃcient methods that would compute clusters of similar documents. In this paper we use an approach based on computing (closed) sets of attributes having large support (large extent) as clusters of similar documents. The method is tested in a series of computer experiments on large public collections of web documents and compared to other established methods and software, such as biclustering, on same datasets. Practical eﬃciency of diﬀerent algorithms for computing frequent closed sets of attributes is compared.

Dmitry I. Ignatov, Sergei O. Kuznetsov

Real-time Traffic

Applied Computing | Frequent Closed Sets | ICCS 2009 | Large Public Collections | Similar Documents |

claim paper

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	ICCS
Authors	Dmitry I. Ignatov, Sergei O. Kuznetsov

Sciweavers

Frequent Itemset Mining for Clustering Near Duplicate Web Documents

Applied Computing | Frequent Closed Sets | ICCS 2009 | Large Public Collections | Similar Documents |

Explore & Download

Productivity Tools

Sciweavers