Sciweavers

IEAAIE
2010
Springer

A Study of Detecting Computer Viruses in Real-Infected Files in the n-Gram Representation with Machine Learning Methods

13 years 2 months ago
A Study of Detecting Computer Viruses in Real-Infected Files in the n-Gram Representation with Machine Learning Methods
Abstract. Machine learning methods were successfully applied in recent years for detecting new and unseen computer viruses. The viruses were, however, detected in small virus loader files and not in real infected executable files. We created data sets of benign files, virus loader files and real infected executable files and represented the data as collections of n-grams. Histograms of the relative frequency of the n-gram collections indicate that detecting viruses in real infected executable files with machine learning methods is nearly impossible in the n-gram representation. This statement is underpinned by exploring the n-gram representation from an information theoretic perspective and empirically by performing classification experiments with machine learning methods.
Thomas Stibor
Added 13 Feb 2011
Updated 13 Feb 2011
Type Journal
Year 2010
Where IEAAIE
Authors Thomas Stibor
Comments (0)