Development of a Medical Condition Prediction Model Using Natural Language Processing with K-Nearest Neighbour

  • Bolaji A. Omodunbi
  • Afeez A. Soladoye Federal university Oye-Ekiti
  • Nnamdi S. Okomba
  • Mutiu O Ayinla
  • Charity S. Odeyemi


Capturing the effect of drugs being used by patients and using this review to predict the medical ailment they are facing is a good approach to easily predict medical conditions. A lot of researchers use clinical and demographic data (risk factors) to predict diseases, the limitation of this approach is that not all the instances would have the right clinical results and there is usually missing values, low prediction accuracy, inadequately pre-processed dataset, failure to consider feature selection and un-experimentation of alternative values of K when using K-nearest neighbour. Using drug review would go a long way as their effect and symptoms as reported by the user through their review would capture relevant information needed. This study employed an open access drug review dataset to predict the medical condition, this dataset consist of training and testing split which was integrated and later split using 80-20 splitting with stratification. The dataset went through some natural language processing techniques such as lemmatization, stemming, removal of stop words, tokenization, and vectorization among others. Forward –backward feature selection technique was employed with the comments having significant effect to the prediction of the condition. K-nearest neighbour was then employed to predict the medical condition using the drug review as the feature with the condition as the target variable. Different values of nearest neighbours were used to train the model with k=1 given the best predictive average accuracy of 89% with weighted average precision of 90%. The model gave the same average accuracy of 84% when k was initialised to 3, 4, 5 and 6 respectively. Moreover, the model obtained a better result when compared with exciting systems. Therefore, with the use of artificial intelligence, medical doctors and patients can easily use drug review to predict certain medical condition using clinical predictive decision support system.

Author Biographies

Mutiu O Ayinla
Kwara state college of education, ilorin
Charity S. Odeyemi
Department of Computer Engineering, Federal University Akure


Aditya, A., and Rawat, S. (2019). Review Sentiment Analysis and Rating Prediction on Drug Review Dataset. Accessed on 12th February 2024 retrieved from

Ak, M. F. (2020) “A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and Machine Learning Applications” Healthcare, 8(111)

Alarja, F. K. and Khan, J. A. (2023). Deep Dive into fake news detection: Feature-Centric CLassification with Ensemble and Deep Learning Methods. Algorithms, 16, 507

Ali, A. and Syed, A. M. (2020). Cyberbully detection using machine learning Pakistan Journal of Engineering and Technology (PekJET), 5(1), 45-50

Anil, D., Vembar, A., Hiriyannaiah, S. and Srinivasa, K. G. (2018).Performance Analysis of Deep Learning Architectures for Recommendation Systems. IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), Bengaluru, India, 129-136

Chahat, R., Ayush, A., Gnana, B., Bhuya, N. and Mukesh, P. (2021). Cyberbullying Detection: Hybrid Models Based on Machine Learning and Natural Language Processing Techniques” – 10th Anniversary of Electronics: Recent Advances in Computer Science & Engineering)

Chauhan, S., Bahl, V., Sengar, N. and Goel, A. (2021). Sentiment Analysis of Drug Reviews Using Wit.AI. International Research Journal of Modernization in Engineering Technology and Science, 3(12)

Das, S., Badhon, A. J. and Jalal, M. (2022). Predicting Effectiveness of Drug from Patient’s Review. In Proceedings of the 2nd International Conference on “Advancement in Electronics & Communication Engineering (AECE 2022) July 14-15

Das, S., Mahata, S. K., Das, A. and Deb, K. (2021). Disease Prediction from Drug Information using Machine Learning. American Journal of Electronics & Communication, 1 (4), 16-21

Dinh, T., Chakraborty, G. and McGaugh, M. (2020). Exploring Online Drug Reviews using Text Analytics, Sentiment Analysis and Data Mining Models. SAS 2020 Global Forum, 4809

Gräßer, F., Kallumadi, S., Malberg, H. and Zaunseder, S. (2018). Aspect-Based Sentiment Analysis of Drug Reviews Applying Cross-Domain and Cross-Data Learning. In Proceedings of the 2018 International Conference on Digital Health (DH '18). ACM, New York, NY, USA

Islam, N., Haque, R., Pareek, P. K., Islam, M., Sajeeb, I. H. and Ratul, M. H. (2023). Deep Learning for Multi-Labeled Cyberbully Detection: Enhancing Online Safety. In proceedings of 2023 International Conference on Data Science and Network Security (ICDSNS)

Joshi, S. and Abdelfattah, E. (2021). Multi-class Text Classification Using Machine Learning Models for Online Drug Reviews. Retrieved from accessed on 13th February, 2024

Khanam, Z , Alwasel, B. N, Sirafi, H, and Rashid, M. (2021)” Fake News Detection Using Machine Learning Approaches” IOP Conf. Series: Materials Science and Engineering 1099 012040 DOI:10.1088/1757-899X/1099/1/012040

Kotsiantis, S. B., Zaharakis, I. D., and Pintelas, P. E. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26(3), 159–190.

Mehta, D. K., Patel, M. B., Dangi, A., Patwa, N., Patel, Z., Jain, R., Shah, P. D. and Suthar, B. R. (2024). Exploring the efficacy of natural language processing and supervised learning in the classification of fake news articles. Advanced of Robotics Technology, 2(1)

Padalko, H., Chomko, V., Yakovlev, S. and Chumachenko, D (2023). Ensemble machine learning approach for fake news classification. Intelligent information technologies, 4(108), 5-18

Sharma, S., Saran, Shankar M, and Patil (2020) “Fake News Detection using Machine learning algorithm: International Journal of creative research thought (IJCRT), 8(6).

Soladoye, A. A. (2023). Decision support system for prediction of stroke using Recurrent Neural Networks with Gated Recurrent Units. Master Thesis, Department of Computer Engineering, Federal University, Oye-Ekiti. Nigeria

Uddin, M. N., Hafiz, M. F., Hossain, S. and Mominul-Islam, S. M. (2022). Drug Sentiment Analysis using Machine Learning Classifiers. (IJACSA) International Journal of Advanced Computer Science and Applications, 13(1), 92-100

Vijayaraghavan, S. and Bas, D. (2020). Sentiment Analysis in Drug Reviews using Supervised Machine Learning Algorithms. arXiv:2003.11643v1 [cs.CL]