Identifying Blood Biomarkers for Dementia Using Machine Learning Methods in the Framingham Heart Study

Blood biomarkers for dementia have the potential to identify preclinical disease and improve participant selection for clinical trials. Machine learning is an efficient analytical strategy to simultaneously identify multiple candidate biomarkers for dementia. We aimed to identify important candidate blood biomarkers for dementia using three machine learning models. We included 1642 (mean 69 ± 6 yr, 53% women) dementia-free Framingham Offspring Cohort participants attending examination, 7 who had available blood biomarker data. We developed three machine learning models, support vector machine (SVM), eXtreme gradient boosting of decision trees (XGB), and artificial neural network (ANN), to identify candidate biomarkers for incident dementia. Over a mean 12 ± 5 yr follow-up, 243 (14.8%) participants developed dementia. In multivariable models including all 38 available biomarkers, the XGB model demonstrated the strongest predictive accuracy for incident dementia (AUC 0.74 ± 0.01), followed by ANN (AUC 0.72 ± 0.01), and SVM (AUC 0.69 ± 0.01). Stepwise feature elimination by random sampling identified a subset of the nine most highly informative biomarkers. Machine learning models confined to these nine biomarkers showed improved model predictive accuracy for dementia (XGB, AUC 0.76 ± 0.01; ANN, AUC 0.75 ± 0.004; SVM, AUC 0.73 ± 0.01). A parsimonious panel of nine candidate biomarkers were identified which showed moderately good predictive accuracy for incident dementia, although our results require external validation.