Given only the URL of a web page, can we identify its topic? This is the question that we examine in this paper. Usually, web pages are classified using their content [7], but a U...
We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particu...
The proliferation of malware has presented a serious threat to the security of computer systems. Traditional signature-based antivirus systems fail to detect polymorphic and new, ...
KDD is a complex and demanding task. While a large number of methods has been established for numerous problems, many challenges remain to be solved. New tasks emerge requiring th...
Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Mar...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...