TY - JOUR
T1 - COOBoostR
T2 - An Extreme Gradient Boosting-Based Tool for Robust Tissue or Cell-of-Origin Prediction of Tumors
AU - Yang, Sungmin
AU - Ha, Kyungsik
AU - Song, Woojeung
AU - Fujita, Masashi
AU - Kübler, Kirsten
AU - Polak, Paz
AU - Hiyama, Eiso
AU - Nakagawa, Hidewaki
AU - Kim, Hong Gee
AU - Lee, Hwajin
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2023/1
Y1 - 2023/1
N2 - We present here COOBoostR, a computational method designed for the putative prediction of the tissue- or cell-of-origin of various cancer types. COOBoostR leverages regional somatic mutation density information and chromatin mark features to be applied to an extreme gradient boosting-based machine-learning algorithm. COOBoostR ranks chromatin marks from various tissue and cell types, which best explain the somatic mutation density landscape of any sample of interest. A specific tissue or cell type matching the chromatin mark feature with highest explanatory power is designated as a potential tissue- or cell-of-origin. Through integrating either ChIP-seq based chromatin data, along with regional somatic mutation density data derived from normal cells/tissue, precancerous lesions, and cancer types, we show that COOBoostR outperforms existing random forest-based methods in prediction speed, with comparable or better tissue or cell-of-origin prediction performance (prediction accuracy—normal cells/tissue: 76.99%, precancerous lesions: 95.65%, cancer cells: 89.39%). In addition, our results suggest a dynamic somatic mutation accumulation at the normal tissue or cell stage which could be intertwined with the changes in open chromatin marks and enhancer sites. These results further represent chromatin marks shaping the somatic mutation landscape at the early stage of mutation accumulation, possibly even before the initiation of precancerous lesions or neoplasia.
AB - We present here COOBoostR, a computational method designed for the putative prediction of the tissue- or cell-of-origin of various cancer types. COOBoostR leverages regional somatic mutation density information and chromatin mark features to be applied to an extreme gradient boosting-based machine-learning algorithm. COOBoostR ranks chromatin marks from various tissue and cell types, which best explain the somatic mutation density landscape of any sample of interest. A specific tissue or cell type matching the chromatin mark feature with highest explanatory power is designated as a potential tissue- or cell-of-origin. Through integrating either ChIP-seq based chromatin data, along with regional somatic mutation density data derived from normal cells/tissue, precancerous lesions, and cancer types, we show that COOBoostR outperforms existing random forest-based methods in prediction speed, with comparable or better tissue or cell-of-origin prediction performance (prediction accuracy—normal cells/tissue: 76.99%, precancerous lesions: 95.65%, cancer cells: 89.39%). In addition, our results suggest a dynamic somatic mutation accumulation at the normal tissue or cell stage which could be intertwined with the changes in open chromatin marks and enhancer sites. These results further represent chromatin marks shaping the somatic mutation landscape at the early stage of mutation accumulation, possibly even before the initiation of precancerous lesions or neoplasia.
KW - biocomputational method
KW - bioinformatics-based prediction of cell-of-origin
KW - epigenomics
KW - genomics
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85146771419&partnerID=8YFLogxK
U2 - 10.3390/life13010071
DO - 10.3390/life13010071
M3 - Article
AN - SCOPUS:85146771419
SN - 2075-1729
VL - 13
JO - Life
JF - Life
IS - 1
M1 - 71
ER -