•  
  •  
 

Article Type

Article

Abstract

The use of Android smartphones has been rising quickly as Internet-based services gain more popularity and develop. The Android operating system's enormous popularity has drawn malware attacks on these devices. It is difficult and ineffective to detect malware versions with features that modify their behavior in order to avoid detection by machine learning (ML) techniques. Effective feature selection plays a crucial role in detecting malware characteristics and reduces the dimensionality of a large dataset by removing unnecessary features that are not useful and keeping those relevant features that increase classification accuracy and detection rate. This helps to solve the problems associated with malware feature detection. In this paper, a malware detection model was proposed that contains three major stages: the data preprocessing stage, the feature selection stage, and finally, the classification stage. It was tested and evaluated on the CIC and Mal2017 dataset, with the number of samples used being 500,000. More than one step had been applied for preparing the dataset; SMOTE was applied to balance the multi-class dataset. The Chi-square method was applied in the feature selection stage. In the classification stage, the Random Forest algorithm was applied. The results showed that the features had different values in importance. The feature selection technique had a positive effect on performance, where the accuracy was 89.93% for all features, and 93.30% when using the Chi-square method.

Keywords

Android operating system, feature selection, Chi-square, Random Forest Metrics

Share

COinS