Video-based artificial intelligence provided a more accurate and consistent reading of echocardiograms than did experienced sonographers in a blinded trial, a result suggesting that this technology is no longer experimental.
“We are planning to deploy this at Cedars, so this is essentially ready for use,” said David Ouyang, MD, who is affiliated with the Cedars-Sinai Medical School and is an instructor of cardiology at the University of California, both in Los Angeles.
The primary outcome of this trial, called EchoNet-RCT, was the proportion of cases in which cardiologists changed the left ventricular ejection fraction (LVEF) reading by more than 5%. They were blinded to the origin of the reports.
This endpoint was reached in 27.2% of reports generated by sonographers but just 16.8% of reports generated by AI, a mean difference of 10.5% (P < .001).
The AI tested in the trial is called EchoNet-Dynamic. It employs a video-based deep learning algorithm that permits beat-by-beat evaluation of ejection fraction. The specifics of this system were described in a study published 2 years ago in Nature. In that evaluation of the model training set, the absolute error rate was 6% in the more than 10,000 annotated echocardiogram videos.
Echo-Net Is First Blinded AI Echo Trial
Although AI is already being employed for image evaluation in many areas of medicine, the EchoNet-RCT study “is the first blinded trial of AI in cardiology,” Ouyang said. Indeed, he noted that no prior study has even been randomized.
After a run-in period, 3,495 echocardiograms were randomizly assigned to be read by AI or by a sonographer. The reports generated by these two approaches were then evaluated by the blinded cardiologists. The sonographers and the cardiologists participating in this study had a mean of 14.1 years and 12.7 years of experience, respectively.
Each reading by both sonographers and AI was based on a single beat, but this presumably was a relative handicap for the potential advantage of AI technology, which is capable of evaluating ejection fraction across multiple cardiac cycles. The evaluation of multiple cycles has been shown previously to improve accuracy, but it is tedious and not commonly performed in routine practice, according to Ouyang.
AI Favored for All Major Endpoints
The superiority of AI was calculated after noninferiority was demonstrated. AI also showed superiority for the secondary safety outcome which involved a test-retest evaluation. Historical AI and sonographer echocardiogram reports were again blindly assessed. Although the retest variability was lower for both (6.29% vs. 7.23%), the difference was still highly significant in favor of AI (P < .001)
The relative efficiency of AI to sonographer assessment was also tested and showed meaningful reductions in work time. While AI eliminates the labor of the sonographer completely (0 vs. a median of 119 seconds, P < .001), it was also associated with a highly significant reduction in median cardiologist time spent on echo evaluation (54 vs. 64 seconds, P < .001).
Assuming that AI is integrated into the routine workflow of a busy center, AI “could be very effective at not only improving the quality of echo reading output but also increasing efficiencies in time and effort spent by sonographers and cardiologists by simplifying otherwise tedious but important tasks,” Ouyang said.
The trial enrolled a relatively typical population. The median age was 66 years, 57% were male, and comorbidities such as diabetes and chronic kidney disease were common. When AI was compared with sonographer evaluation in groups stratified by these variables as well as by race, image quality, and location of the evaluation (inpatient vs. outpatient), the advantage of AI was consistent.
Cardiologists Cannot Detect AI-read Echos
Identifying potential limitations of this study, James D. Thomas, MD, professor of medicine, Northwestern University, Chicago, pointed out that it was a single-center trial, and he questioned a potential bias from cardiologists able to guess accurately which of the reports they were evaluating were generated by AI.
Ouyang acknowledged that this study was limited to patients at UCLA, but he pointed out that the training model was developed at Stanford (Calif.) University, so there were two sets of patients involved in testing the machine learning algorithm. He also noted that it was exceptionally large, providing a robust dataset.
As for the bias, this was evaluated as predefined endpoint.
“We asked the cardiologists to tell us [whether] they knew which reports were generated by AI,” Ouyang said. In 43% of cases, they reported they were not sure. However, when they did express confidence that the report was generated by AI, they were correct in only 32% of the cases and incorrect in 24%. Ouyang suggested these numbers argue against a substantial role for a bias affecting the trial results.
Thomas, who has an interest in the role of AI for cardiology, cautioned that there are “technical, privacy, commercial, maintenance, and regulatory barriers” that must be circumvented before AI is widely incorporated into clinical practice, but he praised this blinded trial for advancing the field. Even accounting for any limitations, he clearly shared Ouyang’s enthusiasm about the future of AI for EF assessment.
Ouyang reports financial relationships with EchoIQ, Ultromics, and InVision. Thomas reports financial relationships with Abbott, GE, egnite, EchoIQ, and Caption Health.
This article originally appeared on MDedge.com, part of the Medscape Professional Network.