Among all neurological diseases, the incidence of Parkinson’s disease (PD) has increased most significantly. Parkinson’s disease is usually diagnosed on the basis of motor neuron symptoms, such as resting tremor, rigidity, and slowness of movement. However, detection of non-motor symptoms, such as constipation, apathy, loss of smell, and sleep disturbances, can aid early diagnosis of Parkinson’s disease for several years to decades.
at recent days ACS Central Science StadyAnd Scientists from the University of New South Wales (UNSW) discuss a machine learning (ML)-based tool that can detect PD years before symptoms first appear.
Stady: Interpretable machine learning on metabolomics data reveals biomarkers of Parkinson’s disease. Image credit: SomYuZu/Shutterstock.com
Currently, the overall diagnostic accuracy of Parkinson’s disease based on motor symptoms is 80%. This accuracy can be increased if PD is diagnosed based on vital signs rather than relying primarily on physical symptoms.
Many diseases have been discovered based on biomarkers associated with metabolic processes. Biometabolites from blood plasma or serum samples are evaluated using analytical tools such as mass spectrometry (MS).
Non-invasive diagnostic methods using cutaneous sebum and breath have been gaining popularity recently. Previous studies have shown that MS can display differential metabolite profiles between pre-PD candidates and healthy individuals.
This difference in metabolite profiles was observed up to 15 years before the clinical diagnosis of PD. Thus, metabolic biomarkers can be used to detect PD much earlier than recently used methods.
ML methods are widely used to develop accurate prediction models for disease diagnosis using large metabolite data. However, developing prediction models based on complete metabolomics datasets is associated with several drawbacks, including overtraining that can reduce diagnostic performance. The majority of models are developed using a smaller subset of features, which are predetermined by traditional statistical methods.
Some ML approaches, such as the linear support vector machine (SVM) and partial least squares discriminant analysis (PLSDA), may fail to account for key features in metabolomics datasets. However, this limitation has been solved by advanced ML methods, such as neural networks (NN), which are specifically designed for processing large data.
NN is used to develop models that have a nonlinear effect. The main drawback of NN-based predictive models is the lack of mechanistic information and the models are not interpretable.
Shapley Additional Interpretations (SHAP) were recently developed to interpret ML models. However, this technique has not yet been used to analyze metabolite data sets.
In the current study, the researchers evaluated blood samples obtained from the European Spanish Prospective Study on Nutrition and Cancer (EPIC) using different analytical tools such as gas chromatography-MS (GC-MS), capillary electrophoresis-MS (CE-MS) and liquid . Chromatography-MS (LC-MS).
The EPIC study presented metabolite data from blood plasma samples obtained from both healthy candidates, as well as those who subsequently developed PD up to 15 years after the sample was originally collected.
Diane Zhang, a researcher at UNSW, has developed a ML tool called Classification and Ranking Analysis Using Neural Networks that Generate Knowledge from MS (CRANK-MS). This tool is designed to interpret the NN-based framework for analyzing the metabolomics dataset generated by the analytical tools.
CRANK-MS consists of several features, including integrated model parameters that provide high dimensionality for metabolite datasets to be analyzed without the need for any predefined chemical features.
CRANK-MS also includes SHAP for retrospective exploration and identification of key chemical features that aid in accurate model prediction. Furthermore, SHAP enables standard testing with five known ML methods to compare diagnostic performance and validate chemical features.
Metabolic data obtained from 39 patients who developed parkinsonism after 15 years were examined with the newly developed ML-based tool. The metabolite profile of 39 pre-PD patients was compared with 39 matched patients, providing a unique combination of metabolites that can be used as an early warning marker of the onset of PD. Notably, this ML approach showed higher accuracy for predicting PD before clinical diagnosis.
Five scores scored consistently high in all six ML models, indicating their potential utility for predicting the future development of PD. These metabolite classes included polyfluorinated alkyl substance (PFAS), triterpenoids, diacylglycerols, steroids, and cholestan steroids.
The detected diacylglycerol metabolite 1,2-diacylglycerol (34:2) is in certain vegetable oils such as olive oil, which is frequently consumed in the Mediterranean diet. PFAS is an environmental neurotoxin that can alter neuronal processing, signaling, and function. Thus, dietary and environmental factors may contribute to the development of Parkinson’s disease.
CRANK-MS is publicly available to all researchers interested in diagnosing diseases using an ML approach based on metabolic data.
The application of CRANK-MS to detect Parkinson’s disease is just one example of how AI can improve the way diseases are diagnosed and monitored. What is exciting is that CRANK-MS can easily be applied to other diseases to identify new biomarkers of interest. It also claimed that this tool is easy to use and can produce results “in less than 10 minutes on a traditional laptop”.
- Zhang, DJ, Xue, C., Kolachalama, VB, & Donald, WA (2023) Interpretable machine learning on metabolite data reveals biomarkers of Parkinson’s disease. ACS Central Science. doi: 10.1021/acscentsci.2c01468