TY - JOUR
T1 - Machine learning for early diagnosis of Kawasaki disease in acute febrile children
T2 - retrospective cross-sectional study in China
AU - Zheng, Wei
AU - Zhu, Shiben
AU - Wang, Xuelian
AU - Chen, Cuixuan
AU - Zhen, Zifeng
AU - Xu, Yi
AU - Mo, Xiaolan
AU - Tse, Gary
AU - Li, Xufang
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/12
Y1 - 2025/12
N2 - Early diagnosis of Kawasaki disease (KD) allows timely treatment to be initiated, thereby preventing coronary artery aneurysms in children. However, it is challenging due to the subjective nature of the diagnostic criteria. This study aims to develop a machine learning prediction model using routine blood tests to distinguish children with KD from other febrile illnesses in Chinese children within the first five days of fever onset. The retrospective cross-sectional data for this study was collected from the records of Guangzhou Women and Children’s Medical Center, spanning January 1, 2020, to April 30, 2024. A retrospective analysis was performed using three machine learning models and five ensemble models based on this dataset. This study included 1,089 children with KD (mean age 32.8 ± 27.0 months; 34.5% female) and a control group of 81,697 children without KD (mean age 45.3 ± 33.6 months; 42.8% female). The supervised method, Xtreme Gradient Boosting (XGBoost), was applied. It was tested without feature selection, achieved an area under the ROC curve (AUC) of 0.9999, sensitivity of 0.9982, specificity of 0.9975, F1 score of 0.9979, accuracy of 0.9979, positive predictive value (PPV) of 0.9975, and negative predictive value (NPV) of 0.9982. The SHapley Additive exPlanations (SHAP) summary plot identified the top five significant features, which were the percentage of eosinophils (EO%), hematocrit (HCT), platelet crit (PCT), gender, and absolute basophil count (BA#). This study demonstrates that the application of the machine learning model, XGBoost, on routine blood test results can predict KD.
AB - Early diagnosis of Kawasaki disease (KD) allows timely treatment to be initiated, thereby preventing coronary artery aneurysms in children. However, it is challenging due to the subjective nature of the diagnostic criteria. This study aims to develop a machine learning prediction model using routine blood tests to distinguish children with KD from other febrile illnesses in Chinese children within the first five days of fever onset. The retrospective cross-sectional data for this study was collected from the records of Guangzhou Women and Children’s Medical Center, spanning January 1, 2020, to April 30, 2024. A retrospective analysis was performed using three machine learning models and five ensemble models based on this dataset. This study included 1,089 children with KD (mean age 32.8 ± 27.0 months; 34.5% female) and a control group of 81,697 children without KD (mean age 45.3 ± 33.6 months; 42.8% female). The supervised method, Xtreme Gradient Boosting (XGBoost), was applied. It was tested without feature selection, achieved an area under the ROC curve (AUC) of 0.9999, sensitivity of 0.9982, specificity of 0.9975, F1 score of 0.9979, accuracy of 0.9979, positive predictive value (PPV) of 0.9975, and negative predictive value (NPV) of 0.9982. The SHapley Additive exPlanations (SHAP) summary plot identified the top five significant features, which were the percentage of eosinophils (EO%), hematocrit (HCT), platelet crit (PCT), gender, and absolute basophil count (BA#). This study demonstrates that the application of the machine learning model, XGBoost, on routine blood test results can predict KD.
KW - Coronary aneurysm
KW - Kawasaki disease
KW - Machine learning
KW - Pediatrics
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85218911908&partnerID=8YFLogxK
U2 - 10.1038/s41598-025-90919-y
DO - 10.1038/s41598-025-90919-y
M3 - Article
C2 - 40000757
AN - SCOPUS:85218911908
SN - 2045-2322
VL - 15
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 6799
ER -