Deteksi Dini Penyakit Stroke pada Data Tidak Seimbang Menggunakan SMOTE dan Random Forest
DOI:
https://doi.org/10.70309/ticom.v13i3.156Keywords:
Random Forest, Classification, Ensemble Method, SMOTE, Imbalanced DataAbstract
Loss of blood circulation to the brain causes a stroke, which is also known as a brain attack. In addition, blood clots are also the leading cause of stroke in the brain. Based on the WHO report, stroke is the leading cause of death in Indonesia in 2024, with a death rate reaching 131.8 per 100,000 population. This study aims to classify early detection of stroke disease by applying the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology using the Random Forest algorithm. The data used is public through the website www.kaggle.com, with a total of 4981 records consisting of 11 attributes. The data composition is unbalanced, with 4733 negative stroke data (95%) and 248 positive strokes (5%). Handling imbalanced data using the Synthetic Minority Oversampling Technique (SMOTE). The total data from SMOTE is 5981 records, with 4733 negative data and 1248 positive. After exploring several models, the best model was obtained using Random Forest with the SMOTE approach, producing an accuracy of 80.14%, AUC 0.836, recall 63.33%, and precision 11.42%.
References
Md. M. Islam, et al., ‘Stroke Prediction Analysis using Machine Learning Classifiers and Feature Technique’, International Journal of Electronics and Communications Systems, vol. 1, no. 2, pp. 57–62, 2021, doi: 10.24042/ijecs.v1i2.10393.
S. Siswanto, ‘Laporan Nasional RISKESDAS 2018’, Kementerian Kesehatan RI, vol. 1, no. 1, p. 1, 2019.
A. F. Riany, and G. Testiana, “Penerapan Data Mining untuk Klasifikasi Penyakit Stroke Menggunakan Algoritma Naïve Bayes,” Jurnal Saintekom: Sains, Teknologi, Komputer dan Manajemen, vol. 13, no. 1, pp. 42–54, 2023, https://doi.org/10.33020/saintekom.v13i1.352.
M. N. Maskuri, H. Harliana, K. Sukerti, and R. M. H. Bhakti, “Penerapan Algoritma K-Nearest Neighbor (KNN) untuk Memprediksi Penyakit Stroke Stroke Desease Predict Using KNN Algorithm,” Jurnal Ilmiah Intech: Information Technology Journal of UMUS, vol. 4, no. 1, pp. 130–140, 2022, https://doi.org/10.46772/intech.v4i01.751.
A. F. Hermawan, F. R. Umbara, and F. Kasyidi, “Prediksi Awal Penyakit Stroke Berdasarkan Rekam Medis menggunakan Metode Algoritma CART (Classification and Regression Tree),” MIND (Multimedia Artificial Intelligent Networking Database) Journal, vol. 7, no. 2, pp. 151–164, 2022, https://doi.org/10.26760/mindjournal.v7i2.151-164.
R. E. Pambudi, S. Sriyanto, and F. Firmansyah, “Klasifikasi Penyakit Stroke Menggunakan Algoritma Decision Tree C4.5,” Jurnal Teknika, vol. 16, no. 2, pp. 221–226, 2022, https://doi.org/10.5281/zenodo.7535865.
M. I. Putri and I. Kharisudin, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) Terhadap Analisis Sentimen Data Review Pengguna Aplikasi Marketplace Tokopedia,” PRISMA, Prosiding Seminar Nasional Matematika, vol. 5, pp. 759–766, 2022.
N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Jurnal of Artificial Intelligence, vol. 16, pp. 321-357, 2002, https://doi.org/10.1613/jair.953.
A. Rohman and M. Rochcham, “Komparasi Metode Klasifikasi Data Mining Untuk Prediksi Kelulusan Mahasiswa,” Neo Teknika, vol. 5, no. 1, pp. 23–29, 2019, doi: 10.37760/neoteknika.v5i1.1379.
B. Budiman, “Perbandingan Algoritma Klasifikasi Data Mining untuk Penelusuran Minat Calon Mahasiswa Baru,” Nuansa Informatika, vol. 15, no. 2, pp. 37–52, 2021, doi: 10.25134/nuansa.v15i2.4162.
Luthfiana Ratnawati and Dwi Ratna Sulistyaningrum, ‘Penerapan Random Forest untuk Mengukur Tingkat Keparahan Penyakit pada Daun Apel’, Jurnal Sains Dan Seni ITS, vol. 8, no. 2, pp. 71–77, 2019.
U. Erdiansyah, A. Irmansyah Lubis, and K. Erwansyah, “Komparasi Metode K-Nearest Neighbor dan Random Forest dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil,” Jurnal Media Informatika Budidarma, vol. 6, no. 1, pp. 208-214, 2022, doi: 10.30865/mib.v6i1.3373.
J. Han, M. Kamber, and J. Pei, Data Mining Concept and Techniques, 3rd ed. USA: Morgan Kaufmann Publishers, 2012.
Qadrini L, Sepperwali A, and Aina A, “Decision Tree dan Adaboost Pada Klasifikasi Penerima Program Bantuan Sosial,” Jurnal Inovasi Penelitian, vol. 2, no. 7, pp. 1959–1966, 2021, https://doi.org/10.47492/jip.v2i7.1046.
A. C. Mawarni, R. Rusdah, L. L. Hin, and D. Anubhakti, ‘Deteksi Dini Gejala Awal Penyakit Diabetes Menggunakan Algoritma Random Forest’, IDEALIS : InDonEsiA journaL Information System, vol. 6, no. 2, pp. 165–171, Jul. 2023, doi: 10.36080/idealis.v6i2.3018.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhammad Iqbal Aryabima, Rusdah, Ririt Roeswidiah, Ahmad Pudoli

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
CC BY-SA 4.0
Creative Commons Attribution-ShareAlike 4.0 International
This license requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, even for commercial purposes. If others remix, adapt, or build upon the material, they must license the modified material under identical terms.
BY: Credit must be given to you, the creator.
SA: Adaptations must be shared under the same terms.ng







