A new coronary artery disease (CAD) “digital marker” can pinpoint gradients of risk on a spectrum, potentially improving CAD diagnosis and management, as well as clinical trial outcomes, a new analysis suggests.
“The study was motivated by the fact that CAD is a spectrum disease, as shown in earlier studies, where differences in the amount of plaque result in distinct gradations of risk for atherosclerosis and survival,” principal author Ron Do, PhD, Icahn School of Medicine at Mount Sinai, New York City, told theheart.org | Medscape Cardiology.
“This study is a follow-up to our previous study, where we used a similar modeling strategy — machine learning and electronic health records (EHRs) — to predict CAD within a year in a case-control fashion.” That study, like other CAD research, used a conventional binary framework, which simply predicted whether or not CAD would occur.
The current report is unique, Do said, in that the model produces a score that measures CAD on a spectrum, identifying gradients of risk.
The new model predicted CAD with high sensitivity and specificity using two sets of biobank EHR data; increasing risk scores mirrored increased coronary artery stenosis from angiography data, including risks for multivessel and obstructive disease, as well as prevalence of all-cause death and recurrent myocardial infarction.
“We believe this proof-of-concept pilot for CAD as a spectrum of disease is generalizable, and our method could be applied to a variety of diseases,” lead author Iain S. Forrest, PhD, also from Mount Sinai, told theheart.org | Medscape Cardiology.
The study, for which the researchers analyzed close to 100,000 EHRs, was published online December 20 in The Lancet.
Risk on a Spectrum
For the study, the researchers developed and validated the EHR-based, CAD-predictive machine learning model; translated the resulting probabilities into in-silico scores for CAD (ISCAD: 0, lowest probability to 1, highest probability); and measured the association of ISCAD with clinical outcomes, including coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae.
They trained and validated the model using 20,497 EHRs from the BioMe Biobank, tested the model on a holdout set (a random sample not used in the model fitting process) of 15,252 EHRs from BioMe, and externally tested the model on 60,186 EHRs from the UK Biobank. They then assessed the association of ISCAD with CAD clinical outcomes from both biobanks.
The median age of the BioMe Biobank participants was 61, 41% were men, and 14% had a CAD diagnosis. Similarly, UK Biobank participants had a median age of 62, 42% were men, and 14% were diagnosed with CAD.
The model predicted CAD with an area under the receiver operating characteristic curve (AUC) of 0.95, sensitivity of 0.94, and specificity of 0.82 in the BioMe validation set, and an AUC of 0.93, sensitivity of 0.90, and specificity of 0.88 in the BioMe holdout set.
In the UK Biobank external test set, the AUC was 0.91, sensitivity was 0.84, and specificity was 0.83.
The ISCAD captured CAD from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles, with a 12-percentage point increase per quartile, including risk for obstructive and multivessel CAD and major coronary artery stenosis.
Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles: decile 1: HR, 1.0; prevalence, 0.2%; decile 6: HR, 11; prevalence, 3.1%; and decile 10: HR, 56; prevalence, 11%. A similar trend was seen for recurrent myocardial infarction.
Twelve (46%) undiagnosed individuals with high ISCAD scores (≥ 0.9) had clinical evidence of CAD, as defined by the 2014 American College of Cardiology/American Heart Association Task Force guidelines.
Study limitations include the use of diagnostic codes to determine CAD status; small sample sizes, which could affect machine learning outcomes and reduce generalizability; and the retrospective nature of the study.
The authors conclude, “Our study shows a reconceptualization of coronary artery disease — including atherosclerosis, death, and sequelae — as a spectrum of disease that is quantifiable with artificial intelligence trained on clinical data.”
Forrest said the team will be scaling up their approach and testing it in other health systems, as well as developing models to be tested in diverse populations. Once the findings are replicated in other systems, a large-scale prospective study following up on individuals with CAD might be feasible.
“Placing individuals on a spectrum of coronary artery disease that accounts for a multitude of factors, as opposed to current scoring systems such as SYNTAX that focus solely on coronary anatomy, could enable tailored interventions that would be better aligned with coronary artery disease risk,” write Puneet Batra, PhD, and Amit V. Khera, MD, MSc, both from the Broad Institute of MIT and Harvard in Cambridge, Massachusetts, in a related editorial.
“An important consideration…is whether increased predictive performance provides an adequate improvement over the many risk models already available,” they write. “This is particularly true for coronary artery disease, for which previous studies have noted comparable risk performance using laboratory-based and non-laboratory-based risk assessment, stratification available from the time of birth based on a polygenic score to quantify inherited susceptibility, and considerable utility of a non-invasive CT scan to measure coronary artery calcification.”
The study was funded by the National Institutes of Health. No relevant financial relationships were declared.
Follow Marilynn Larkin on Twitter: @MarilynnL