Omnibus and Robust Deconvolution Scheme for Bulk RNA Sequencing Data Integrating Multiple Single-Cell Reference Sets and Prior Biological Knowledge

MOTIVATION: Cell-type deconvolution of bulk tissue RNA sequencing (RNA-seq) data is an important step towards understanding the variations in cell-type composition among disease conditions. Owing to recent advances in single-cell RNA sequencing (scRNA-seq) and the availability of large amounts of bulk RNA-seq data in disease-relevant tissues, various deconvolution methods have been developed. However, the performance of existing methods heavily relies on the quality of information provided by external data sources, such as the selection of scRNA-seq data as a reference and prior biological information.
RESULTS: We present the Integrated and Robust Deconvolution (InteRD) algorithm to infer cell-type proportions from target bulk RNA-seq data. Owing to the innovative use of penalized regression with a new evaluation criterion for deconvolution, InteRD has three primary advantages. First, it is able to effectively integrate deconvolution results from multiple scRNA-seq datasets. Second, InteRD calibrates estimates from reference-based deconvolution by taking into account extra biological information as priors. Third, the proposed algorithm is robust to inaccurate external information imposed in the deconvolution system. Extensive numerical evaluations and real data applications demonstrate that InteRD yields more accurate and robust cell-type proportion estimates that agree well with known biology.
AVAILABILITY AND IMPLEMENTATION: The proposed InteRD framework is implemented in R and the package is available at https://cran.r-project.org/web/packages/InteRD/index.html.
SUPPLEMENTARY INFORMATION: Supplementary Materials including pseudo algorithms, more simulation results, and extra discussion and information are available at Bioinformatics online.