Heterogeneous Ensemble Feature Selection and Multilevel Ensemble Approach to Machine Learning Phishing Attack Detection

  • Gabriel O. Ogunleye
  • B. M. Olukoya
  • A. T. Olusesi
  • Patrick Olabisi
  • Queen B. Sodipo
  • Osobukola Adekunle

Abstract

Over the decade, technology has presented human facets with easiest means of accomplishing complex tasks seamlessly, especially in the area of communication. Malicious and vicious links are consciously doctored to resemble the original and sent through emails to millions of users at once at a lower price. Since the emergence of phishing and its cohorts, every solution and means to mitigate the attacks has proven unsuccessful due to the dynamic nature of the attacks. Meanwhile, machine learning (ML) is adopted as the right antidote to phishing detection, with its performance based on diverse steps, especially feature selection. Most studies in the problem domain concentrate more on model optimization than sourcing for a reliable feature selection system and fail to integrate a reliable feature selection along with the classification model. The systems are fed with low-quality data that hampers the performance of such models. The authors noticed the contribution of feature selection to the performance of machine learning models and developed a novel Heterogeneous Ensemble Feature Selection (HEFS) framework for multilevel ensemble machine learning-based phishing detection. In HEFS, three filter-based statistical techniques were exploited to produce a primary subset of phishing features, and the variable selected by each of the techniques was automatically aggregated to produce the baseline features. The selection of the techniques is to overcome each limitation since their ranking principles are different. The experiment revealed that the multilevel ensemble (stacked) on the baseline features outperformed others with an accuracy of 98.8%., including multilevel model on each filter-based method.

References

Abdul Samad, S. R., Balasubaramanian, S., Al-Kaabi, A. S., Sharma, B., Chowdhury, S., Mehbodniya, A., Webber, J. L., & Bostani, A. (2023). Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection. Electronics, 12(7), 1642. https://doi.org/10.3390/electronics12071642

Al-Sarem, M., Saeed, F., Al-Mekhlafi, Z. G., Mohammed, B. A., Al-Hadhrami, T., Alshammari, M. T., Alreshidi, A., & Alshammari, T. S. (2021). An optimized stacking ensemble model for phishing websites detection. Electronics (Switzerland), 10(11). https://doi.org/10.3390/electronics10111285

Amusan, E. A., Adedeji, O. T., Alade, O., Ajala, F. A., & Ibidapo, K. O. (2021). A Mobile Anti-Phishing System Using Linkguard Algorithm. FUOYE Journal of Engineering and Technology, 6(3), 10–14. https://doi.org/10.46792/fuoyejet.v6i3.666

Azeez, N. A., Misra, S., Margaret, I. A., Fernandez-Sanz, L., & Abdulhamid, S. M. (2021). Adopting automated whitelist approach for detecting phishing attacks. Computers and Security, 108. https://doi.org/10.1016/j.cose.2021.102328

Chen, J. L., Ma, Y. W., & Huang, K. L. (2020). Intelligent visual similarity-based phishing websites detection. Symmetry, 12(10), 1–16. https://doi.org/10.3390/sym12101681

Chiew, K. L., Tan, C. L., Wong, K. S., Yong, K. S. C., & Tiong, W. K. (2019). A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Information Sciences, 484, 153–166. https://doi.org/10.1016/j.ins.2019.01.064

Dada, E. G., Bassi, J. S., Chiroma, H., Abdulhamid, S. M., Adetunmbi, A. O., & Ajibuwa, O. E. (2019). Machine learning for email spam filtering: review, approaches and open research problems. Heliyon, 5(6). https://doi.org/10.1016/j.heliyon.2019.e01802

Gangavarapu, T., Jaidhar, C. D., & Chanduka, B. (2020). Applicability of machine learning in spam and phishing email filtering: review and approaches. Artificial Intelligence Review, 53(7), 5019–5081. https://doi.org/10.1007/s10462-020-09814-9

Igwilo, C. M., & Odumuyiwa, V. T. (2022). Comparative Analysis of Ensemble Learning and Non-Ensemble Machine Learning Algorithms for Phishing URL Detection. FUOYE Journal of Engineering and Technology, 7(3), 305–312. https://doi.org/10.46792/fuoyejet.v7i3.807

Khan, S. A., Khan, W., & Hussain, A. (2020). Phishing Attacks and Websites Classification Using Machine Learning and Multiple Datasets (A Comparative Analysis). Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12465 LNAI, 301–313. https://doi.org/10.1007/978-3-030-60796-8_26

Lakshmi, L., Reddy, M. P., Santhaiah, C., & Reddy, U. J. (2021). Smart Phishing Detection in Web Pages using Supervised Deep Learning Classification and Optimization Technique ADAM. Wireless Personal Communications, 118(4), 3549–3564. https://doi.org/10.1007/s11277-021-08196-7

Le-Nguyen, M. K., Nguyen, T. C. H., Le, D. T., Nguyen, V. H., Tôn, L. P., & Nguyen-An, K. (2023). Phishing Website Detection as a Website Comparing Problem. SN Computer Science, 4(2). https://doi.org/10.1007/s42979-022-01544-9

Mao, J., Bian, J., Tian, W., Zhu, S., Wei, T., Li, A., & Liang, Z. (2019). Phishing page detection via learning classifiers from page layout feature. Eurasip Journal on Wireless Communications and Networking, 2019(1). https://doi.org/10.1186/s13638-019-1361-0

Moedjahedy, J., Setyanto, A., Alarfaj, F. K., & Alreshoodi, M. 022). CCrFS: Combine Correlation Features Selection for Detecting Phishing Websites Using Machine Learning. Future Internet, 14(8). https://doi.org/10.3390/fi14080229

Mohamed, G., Visumathi, J., Mahdal, M., Anand, J., & Elangovan, M. (2022). An Effective and Secure Mechanism for Phishing Attacks Using a Machine Learning Approach. Processes, 10(7). https://doi.org/10.3390/pr10071356

Niranjan, A., Sakhamuri, V. K., Deepa Shenoy, P., & Venugopal, K. R. (2020). ERCRFS: Ensemble of random committee and random forest using stackingc for phishing classification. International Journal of Emerging Trends in Engineering Research, 8(1), 79–86. https://doi.org/10.30534/ijeter/2020/13812020

Niu, W., Zhang, X., Yang, G., Ma, Z., & Zhuo, Z. (2018). Phishing emails detection using CS-SVM. Proceedings - 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017, 1054–1059. https://doi.org/10.1109/ISPA/IUCC.2017.00160

Noureldien, N., & Mohmoud, S. (2021). The Efficiency of Aggregation Methods in Ensemble Filter Feature Selection Models. Transactions on Machine Learning and Artificial Intelligence, 9(4), 39–51. https://doi.org/10.14738/tmlai.94.10101

Ojewumi, T. O., Ogunleye, G. O., Oguntunde, B. O & Folorunsho, O. (2022). Performance Evaluation of Machine Learning Tools for Detection of Phishing Attacks on the Webpage. African Institute of Mathematical Sciences. doi.org/10.1016/j.sciaf.2022.e01165

Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/j.eswa.2018.09.029

Shin, S. S., Ji, S. G., & Hong, S. S. (2022). A Heterogeneous Machine Learning Ensemble Framework for Malicious Webpage Detection. Applied Sciences (Switzerland), 12(23). https://doi.org/10.3390/app122312070

Suryan, A., Kumar, C., Mehta, M., & A.Sinha, R. J. (2020). Learning Model For Phishing Website Detection. EAI Endorsed Transactions on Scalable Information Systems, 7(27), 1–9. https://doi.org/10.4108/eai.13-7-2018.163804

Zhou, J., Cui, H., Li, X., Yang, W., & Wu, X. (2023). A Novel Phishing Website Detection Model Based on LightGBM and Domain Name Features. Symmetry, 15(1). https://doi.org/10.3390/sym15010180

Published
2023-12-31