Multi-component similarity method for web product duplicate detection

10 years 20 days ago

Download damirvandic.com

Due to the growing number of Web shops, aggregating product data from the Web is growing in importance. One of the problems encountered in product aggregation is duplicate detection. In this paper, we extend and signiﬁcantly improve an existing state-of-the-art product duplicate detection method. Our approach employs a novel method for combining the titles’ and the attributes’ similarities into a ﬁnal product similarity. We use q-grams to handle partial matching of words, such as abbreviations. Where existing methods cluster products of only two Web shops, we propose a hierarchical clustering method to handle multiple Web shops. Applying our new method to a dataset of TV’s from four Web shops reveals that it signiﬁcantly outperforms the Hybrid Similarity Method, the Title Model Words Method, and the well-known TF-IDF method, with an F1 score of 0.475 compared to 0.287, 0.298, and 0.335, respectively.

Ronald van Bezu, Sjoerd Borst, Rick Rijkse, Jim Ve

Real-time Traffic

Applied Computing | SAC 2015 |

claim paper

» LargeScale Duplicate Detection for Web Image Search

» Efficient partialduplicate detection based on sequence matching

» An Evaluation of Text Retrieval Methods for Similarity Search of Multidimensional NMRSpect...

» Essential deduplication functions for transactional databases in law firms

» Web data integration using approximate string join

» An Investigation of Cloning in Web Applications

» ARISTA Image Search to Annotation on Billions of Web Photos

» FASH A web application for nucleotides sequence search

Post Info
More Details (n/a)

Added	17 Apr 2016
Updated	17 Apr 2016
Type	Journal
Year	2015
Where	SAC
Authors	Ronald van Bezu, Sjoerd Borst, Rick Rijkse, Jim Verhagen, Damir Vandic, Flavius Frasincar

Comments (0)

Sciweavers

Multi-component similarity method for web product duplicate detection

Applied Computing | SAC 2015 |

Explore & Download

Productivity Tools

Sciweavers