Machine learning fails to outperform traditional models in predicting long-term allograft outcomes


A study presented at this year’s American Transplant Congress (ATC; 4–8 June, Boston, USA) found that machine learning (ML) technologies did not outperform traditional statistical models in predicting long-term kidney allograft outcomes—attaining similar prediction performances overall.

These findings were presented by Agathe Truchot (Paris Transplant Group, Paris, France) on behalf of researchers who concluded that, in spite of the “increased use and hype around ML”, their study supports the use of more traditional statistical approaches for prognostication in organ transplantation.

To assess the performance of traditional versus ML-based prognostic models, the researchers developed several ML algorithms and compared their predictive capability to that achieved by the iBox prognostication system (Cordis). They used a validated derivation cohort of 4,000 consecutive kidney recipients prospectively recruited at four centres in France, and three validation cohorts from Europe (n=2,214 patients), North America (n=1,537) and South America (n=671).

A total of 24 parameters, including the time of risk evaluation, and clinical, histopathological, immunological and functional characteristics, were used to develop six ML models ranging from tree-based models to survival support vector machines and gradient boosting techniques. Their respective prediction performances were assessed with discrimination (C-index), calibration and Brier scores, and then compared to those of the iBox system.

Among the 8,422 kidney recipients included, 12.84% (n=1,081) lost their graft after a follow-up time post-transplant of 6.25 years. The median time from transplant to risk evaluation was 0.98 years.

The derivation cohort was split into a training set and a test set, and was used to develop the ML models, while the validation cohorts were used for external validation of the models.

The C-index scores at seven years post-risk evaluation in the derivation cohort ranged from 0.527 to 0.788 for the various ML model types, compared to 0.808 for the iBox. In the external validation cohorts, the best performing ML models achieved a C-index of 0.814, 0.857 and 0.884 in Europe, North America and South America, respectively, and similar scores were observed in each with the iBox.

The researchers noted that ML models achieved satisfactory calibration in the derivation and validation cohorts of their study—but, overall, the traditional statistical models achieved superior performances within the derivation cohort.


Please enter your comment!
Please enter your name here