Deteksi Dini Penyakit Stroke pada Data Tidak Seimbang Menggunakan SMOTE dan Random Forest

Muhammad Iqbal  Aryabima; Rusdah Rusdah; Ririt  Roeswidiah; Ahmad Pudoli

doi:10.70309/ticom.v13i3.156

Authors

Muhammad Iqbal Aryabima Universitas Budi Luhur
Rusdah Rusdah Universitas Budi Luhur
Ririt Roeswidiah Universitas Budi Luhur
Ahmad Pudoli Universitas Budi Luhur

DOI:

https://doi.org/10.70309/ticom.v13i3.156

Keywords:

Random Forest, Classification, Ensemble Method, SMOTE, Imbalanced Data

Abstract

Loss of blood circulation to the brain causes a stroke, which is also known as a brain attack. In addition, blood clots are also the leading cause of stroke in the brain. Based on the WHO report, stroke is the leading cause of death in Indonesia in 2024, with a death rate reaching 131.8 per 100,000 population. This study aims to classify early detection of stroke disease by applying the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology using the Random Forest algorithm. The data used is public through the website www.kaggle.com, with a total of 4981 records consisting of 11 attributes. The data composition is unbalanced, with 4733 negative stroke data (95%) and 248 positive strokes (5%). Handling imbalanced data using the Synthetic Minority Oversampling Technique (SMOTE). The total data from SMOTE is 5981 records, with 4733 negative data and 1248 positive. After exploring several models, the best model was obtained using Random Forest with the SMOTE approach, producing an accuracy of 80.14%, AUC 0.836, recall 63.33%, and precision 11.42%.

References

Md. M. Islam, et al., ‘Stroke Prediction Analysis using Machine Learning Classifiers and Feature Technique’, International Journal of Electronics and Communications Systems, vol. 1, no. 2, pp. 57–62, 2021, doi: 10.24042/ijecs.v1i2.10393.

S. Siswanto, ‘Laporan Nasional RISKESDAS 2018’, Kementerian Kesehatan RI, vol. 1, no. 1, p. 1, 2019.

A. F. Riany, and G. Testiana, “Penerapan Data Mining untuk Klasifikasi Penyakit Stroke Menggunakan Algoritma Naïve Bayes,” Jurnal Saintekom: Sains, Teknologi, Komputer dan Manajemen, vol. 13, no. 1, pp. 42–54, 2023, https://doi.org/10.33020/saintekom.v13i1.352.

M. N. Maskuri, H. Harliana, K. Sukerti, and R. M. H. Bhakti, “Penerapan Algoritma K-Nearest Neighbor (KNN) untuk Memprediksi Penyakit Stroke Stroke Desease Predict Using KNN Algorithm,” Jurnal Ilmiah Intech: Information Technology Journal of UMUS, vol. 4, no. 1, pp. 130–140, 2022, https://doi.org/10.46772/intech.v4i01.751.

A. F. Hermawan, F. R. Umbara, and F. Kasyidi, “Prediksi Awal Penyakit Stroke Berdasarkan Rekam Medis menggunakan Metode Algoritma CART (Classification and Regression Tree),” MIND (Multimedia Artificial Intelligent Networking Database) Journal, vol. 7, no. 2, pp. 151–164, 2022, https://doi.org/10.26760/mindjournal.v7i2.151-164.

R. E. Pambudi, S. Sriyanto, and F. Firmansyah, “Klasifikasi Penyakit Stroke Menggunakan Algoritma Decision Tree C4.5,” Jurnal Teknika, vol. 16, no. 2, pp. 221–226, 2022, https://doi.org/10.5281/zenodo.7535865.

M. I. Putri and I. Kharisudin, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) Terhadap Analisis Sentimen Data Review Pengguna Aplikasi Marketplace Tokopedia,” PRISMA, Prosiding Seminar Nasional Matematika, vol. 5, pp. 759–766, 2022.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Jurnal of Artificial Intelligence, vol. 16, pp. 321-357, 2002, https://doi.org/10.1613/jair.953.

A. Rohman and M. Rochcham, “Komparasi Metode Klasifikasi Data Mining Untuk Prediksi Kelulusan Mahasiswa,” Neo Teknika, vol. 5, no. 1, pp. 23–29, 2019, doi: 10.37760/neoteknika.v5i1.1379.

B. Budiman, “Perbandingan Algoritma Klasifikasi Data Mining untuk Penelusuran Minat Calon Mahasiswa Baru,” Nuansa Informatika, vol. 15, no. 2, pp. 37–52, 2021, doi: 10.25134/nuansa.v15i2.4162.

Luthfiana Ratnawati and Dwi Ratna Sulistyaningrum, ‘Penerapan Random Forest untuk Mengukur Tingkat Keparahan Penyakit pada Daun Apel’, Jurnal Sains Dan Seni ITS, vol. 8, no. 2, pp. 71–77, 2019.

U. Erdiansyah, A. Irmansyah Lubis, and K. Erwansyah, “Komparasi Metode K-Nearest Neighbor dan Random Forest dalam Prediksi Akurasi Klasifikasi Pengobatan Penyakit Kutil,” Jurnal Media Informatika Budidarma, vol. 6, no. 1, pp. 208-214, 2022, doi: 10.30865/mib.v6i1.3373.

J. Han, M. Kamber, and J. Pei, Data Mining Concept and Techniques, 3rd ed. USA: Morgan Kaufmann Publishers, 2012.

Qadrini L, Sepperwali A, and Aina A, “Decision Tree dan Adaboost Pada Klasifikasi Penerima Program Bantuan Sosial,” Jurnal Inovasi Penelitian, vol. 2, no. 7, pp. 1959–1966, 2021, https://doi.org/10.47492/jip.v2i7.1046.

A. C. Mawarni, R. Rusdah, L. L. Hin, and D. Anubhakti, ‘Deteksi Dini Gejala Awal Penyakit Diabetes Menggunakan Algoritma Random Forest’, IDEALIS : InDonEsiA journaL Information System, vol. 6, no. 2, pp. 165–171, Jul. 2023, doi: 10.36080/idealis.v6i2.3018.

Deteksi Dini Penyakit Stroke pada Data Tidak Seimbang Menggunakan SMOTE dan Random Forest

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

MAIN MENU

callforreviewer

Information