Sciweavers

KDD
2002
ACM

Web site mining: a new way to spot competitors, customers and suppliers in the world wide web

14 years 4 months ago
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web
When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web sites is not handled adequately yet, in spite of its importance for various applications. Therefore, this paper discusses the classification of complete web sites. First, we point out the main differences to page classification by discussing a very intuitive approach and its weaknesses. This approach treats a web site as one large HTML-document and applies the well-known methods for page classification. Next, we show how accuracy can be improved by employing a preprocessing step which assigns an occurring web page to its most likely topic. The determined topics now represent the information the web site contains and can be used to classify it more accurately. We accomplish this by following two directions. First, we apply well established classification algorithms to a feature space of occurring topics. The se...
Martin Ester, Hans-Peter Kriegel, Matthias Schuber
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2002
Where KDD
Authors Martin Ester, Hans-Peter Kriegel, Matthias Schubert
Comments (0)