Sciweavers

HICSS
2003
IEEE

Content Based File Type Detection Algorithms

13 years 9 months ago
Content Based File Type Detection Algorithms
Identifying the true type of a computer file can be a difficult problem. Previous methods of file type recognition include fixed file extensions, fixed “magic numbers” stored with the files, and proprietary descriptive file wrappers. All of these methods have significant limitations. This paper proposes algorithms for automatically generating “fingerprints” of file types based on a set of known input files, then using the fingerprints to recognize the true type of unknown files based on their content, rather than metadata associated with them. Recognition is performed by three different algorithms based on: byte frequency analysis, byte frequency cross-correlation analysis, and file header/trailer analysis. Tests were run to measure the accuracy of these algorithms. The accuracy varied from 23% to 96% depending upon which algorithm was used. These algorithms could be used by virus scanning packages, firewalls, intrusion detection systems, forensic analyses of computer hard dri...
Mason McDaniel, Mohammad Hossain Heydari
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where HICSS
Authors Mason McDaniel, Mohammad Hossain Heydari
Comments (0)