D3MI: an efficient and powerful federated imputation method for bias reduction in the analysis of distributed incomplete data by accounting for within-site correlation and between-site heterogeneity.

Electronic health records (EHRs) collected from diverse healthcare institutions offer a rich and representative data source for clinical research. Federated learning enables analysis of these distributed data without sharing sensitive patient-level information, preserving privacy. However, missing data remain a major challenge and can introduce substantial bias if not properly addressed. Very few distributed imputation methods currently exist, and they fail to account for two critical aspects of EHR data: correlation within sites and variability across sites. We aim to fill this important methodological gap.