Learning to detect malicious executables in the wild

12 years 8 months ago
Learning to detect malicious executables in the wild
In this paper, we describe the development of a fielded application for detecting malicious executables in the wild. We gathered 1971 benign and 1651 malicious executables and encoded each as a training example using n-grams of byte codes as features. Such processing resulted in more than 255 million distinct n-grams. After selecting the most relevant n-grams for prediction, we evaluated a variety of inductive methods, including naive Bayes, decision trees, support vector machines, and boosting. Ultimately, boosted decision trees outperformed other methods with an area under the roc curve of 0.996. Results also suggest that our methodology will scale to larger collections of executables. To the best of our knowledge, ours is the only fielded application for this task developed using techniques from machine learning and data mining. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications-Data Mining; I.2.6 [Artificial Intelligence]: Learning-Concept Learni...
Jeremy Z. Kolter, Marcus A. Maloof
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2004
Where KDD
Authors Jeremy Z. Kolter, Marcus A. Maloof
Comments (0)