This is a hybrid ensemble classification-based approach for information security and malware detection. A stacked ensemble of 5 homogenous machine learning algorithms performs the first stage classification, where each model is itself ensembled 5 times individually, while in the the final stage, classification is carried out by assembling these 5 ensembled models together, resulting in a meta-learner. Individually, the following machine learning algorithms are utilized for baseline comparison: K-Nearest Neighbors, Support Vector Machine (SVM), Logistic- Regression, Naive Bayes and Decision Tree.
Malware Classification is done using PE headers (ClaMP) dataset. It is a Malware classifier dataset built with header fields’ values of Portable Executable files. PE (Portable Executable) file format is a data structure that tells the Windows OS loader what information is required to manage the wrapped executable code. This includes dynamic library references for linking, API export, import tables, resource management data, and TLS data.