A comparison of published in 2018-2024 general-purpose models for predicting gas chromatographic retention indices

Keywords: gas chromatography, retention index, neural networks, machine learning

Abstract

Retention indices are widely used in gas chromatography and chromatography-mass spectrometry as an additional factor in tentative identification (along with the mass spectrum). Reference data on retention indices are available only for a limited number of molecules; in other cases, retention indices predicted by mathematical models can be used. Models for predicting retention indices developed prior to 2018 mostly have either very low accuracy or a very narrow domain of applicability. However, in recent years, starting from 2018, the situation has begun to change: the use of deep neural networks and large training sets (mainly different versions of the NIST database) made it possible to build both accurate and general-purpose models for predicting gas chromatographic retention indices, with the accuracy increasing over time. In recent years, at least 7 deep learning-based models for predicting gas chromatographic retention indices have been released in the public domain. The authors always declare that their model is more accurate than previous models, however, in all cases, there are no independent measurements of accuracy. This work aimed to objectively and critically compare retention index prediction models and corresponding software using the same retention data set that was guaranteed not to intersect with the training sets used by the authors of the models. Seven models and corresponding software were considered, including MetExpert (2018), DeepReI (2021), SVEKLA (2021), and AIRI (2024). It was shown that for the non-polar stationary phase (ZB-5MS), the accuracy of the newest models gradually approaches the accuracy of the reference libraries and is quite high. The newer models are indeed more accurate than the older ones. At the same time, for the polar stationary phase (SH-Stabilwax), the accuracy (independent data set) is very low and significantly lower than that stated in the original papers devoted to the predictive models. For users with limited experience, the process of compiling and running software can be challenging, particularly when attempting to do so several years after publication. This is often due to incompatibility issues between model files and newer versions of the frameworks. It is not uncommon for software authors to discontinue any support of the software after an article has been published in a journal.

Downloads

Download data is not yet available.

Author Biographies

Anastasia Yu. Sholokhova, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry of Russian Academy of Sciences, Mos-cow, Russian Federation

researcher, laboratory of physicochemical principles of chromatography and chromatography – mass spectrometry; A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, RAS, Moscow, Russian Federation, email: dm.matiushin@mail.ru

Dmitriy D. Matyushin, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry of Russian Academy of Sciences, Mos-cow, Russian Federation

в.н.с. лаборатории «умных» методов химического анализа, Институт физической химии и электрохимии имени А.Н. Фрумкина РАН, Москва, Россия

References

Zhang J., Koo I., Wang B., Gao Q. W., Zheng C. H., Zhang X., A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures, Journal of Chromatography A. 2012; 1251: 188-193. doi.org/10.1016/j.chroma.2012.06.036

Matyushin D.D., Sholokhova A.Yu., Buryak A.K., A deep convolutional neural network for the estimation of gas chromatographic retention indices, Journal of Chromatography A. 2019; 1607: 460395. https://doi.org/10.1016/j.chroma.2019.460395

Héberger K., Quantitative structure–(chromatographic) retention relationships, Journal of Chromatography A. 2007; 1158(1-2): 273-305. https://doi.org/10.1016/j.chroma.2007.03.108

Stein S. E., Babushok V. I., Brown R. L., Linstrom P. J., Estimation of Kováts Retention Indices Using Group Contributions, Journal of Chemical Information and Modeling. 2007; 47 (3): 975-980. https://doi.org/10.1021/ci600548y

Geer L. Y., Stein S. E., Mallard W. G., Slotta D. J., AIRI: Predicting Retention Indices and Their Uncertainties Using Artificial Intelligence, Journal of Chemical Information and Modeling. 2024; 64(3): 690-696. https://doi.org/10.1021/acs.jcim.3c01758

Qiu F., Lei Z., Sumner L.W., MetExpert: An expert system to enhance gas chromatography‒mass spectrometry-based metabolite identifications, Analytica Chimica Acta. 2018; 1037: 316-326. https://doi.org/10.1016/j.aca.2018.03.052

Qu C., Schneider B. I., Kearsley A. J., Keyrouz W., Allison T. C., Predicting Kováts Retention Indices Using Graph Neural Networks, Journal of Chromatography A. 2021; 1646: 462100. https://doi.org/10.1016/j.chroma.2021.462100

Anjum A., Liigand J., Milford R., Gautam V., Wishart D. S., Accurate prediction of isothermal gas chromatographic Kováts retention indices, Journal of Chromatography A. 2023; 1705: 464176. https://doi.org/10.1016/j.chroma.2023.464176

Matyushin D.D., Buryak A.K., Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning, IEEE Access. 2020; 8: 223140-223155. https://doi.org/10.1109/ACCESS.2020.3045047

Matyushin D.D., Sholokhova A.Yu., Buryak A.K., Deep Learning Based Prediction of Gas Chromatographic Retention Indices for a Wide Variety of Polar and Mid-Polar Liquid Stationary Phases, International Journal of Molecular Sciences. 2021; 22 (17): 9194. https://doi.org/10.3390/ijms22179194

Vrzal T., Malečková M., Olšovská J., DeepReI: Deep learning-based gas chromatographic retention index predictor, Analytica Chimica Acta. 2021; 1147: 64–71. doi.org/10.1016/j.aca.2020.12.043

Matyushin D.D., Sholokhova A.Yu., Buryak A.K., Gradient boosting for the prediction of gas chromatographic retention indices, Sorbtsionnye I khromatograficheskie protsessy. 2019; 19(6): 630-635. https://doi.org/10.17308/sorpchrom.2019.19/2223

de Cripan S. M., Cereto-Massagué A., Herrero P., Barcaru A., Canela N., Domingo-Almenara X., Machine Learning-Based Retention Time Prediction of Trimethylsilyl Derivatives of Metabolites, Biomedicines. 2022; 10(4): 879. https://doi.org/10.3390/biomedicines10040879

Matyushin D.D., Buryak A.K., Application of regression learning for gas chromatographic analysis and prediction of toxicity of organic molecules, Russian Chemical Bulletin. 2023; 72(2): 482-492. https://doi.org/10.1007/s11172-023-3811-2

Su Q. Z., Vera P., Nerín C., Lin Q. B., Zhong H. N. Safety concerns of recycling postconsumer polyolefins for food contact uses: Regarding (semi-)volatile migrants untargetedly screened, Resources, Conservation and Recycling. 2021; 167: 105365. https://doi.org/10.1016/j.resconrec.2020.105365

Sholokhova A. Yu., Matyushin D. D., Grinevich O. I., Borovikova S. A., Buryak A. K., Intelligent Workflow and Software for Non-Target Analysis of Complex Samples Using a Mixture of Toxic Transformation Products of Unsymmetrical Dimethylhydrazine as an Example, Molecules. 2023; 28(8): 3409. https://doi.org/10.3390/molecules28083409

Weininger D., SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, Journal of Chemical Information and Computer Sciences. 1988; 28(1): 31-36. https://doi.org/10.1021/ci00057a005

Zhokhov A.K., Loskutov A.Yu., Rybal’chenko I.V. Methodological Approaches to the Calculation and Prediction of Retention Indices in Capillary Gas Chromatography, Journal of analytical chemistry. 2018; 73(3): 207-220. https://doi.org/10.1134/S1061934818030127

Matyushin D.; Sholokhova A.Yu. (2024). A data set of retention indices and retention times for 200+ molecules and two stationary phases (gas chromatography). figshare. Dataset. https://doi.org/10.6084/m9.figshare.26119558.v2

https://sourceforge.net/projects/metexpert/ (accessed: 24.08.2024)

Matyushin D. (2020). Supplementary data and code for the article "Gas chromatographic retention index prediction using multimodal machine learning". figshare. Software. https://doi.org/10.6084/m9.figshare.12651680.v2 (accessed: 24.08.2024)

https://github.com/mtshn/svekla (accessed: 24.08.2024)

Matyushin D. (2021). Supplementary materials for the article “Deep learning based prediction of gas chromatographic retention indices for a wide variety of polar and mid-polar liquid stationary phases”: source code of software and parameters of pre-trained models. figshare. Software. https://doi.org/10.6084/m9.figshare.14602317.v1 (accessed: 24.08.2024)

https://gcms-id.ca (accessed: 24.08.2024)

https://github.com/usnistgov/masskit_ai/ (accessed: 24.08.2024)

https://pages.nist.gov/masskit_ai/ (accessed: 24.08.2024)

Khrisanfov M.D., Matyushin D.D., Samokhin A.S. A general procedure for finding potentially erroneous entries in the database of retention indices. Analytica Chimica Acta. 2024; 1297: 342375. https://doi.org/10.1016/j.aca.2024.342375

https://github.com/mtshn/molsimwax (accessed: 24.08.2024)

Baker M. 1,500 scientists lift the lid on reproducibility, Nature. 2016; 533(7604): 452-454. https://doi.org/10.1038/533452a

Fanelli D., Is science really facing a reproducibility crisis, and do we need it to? Proceedings of the National Academy of Sciences. 2018; 115(11): 2628-2631. https://doi.org/10.1073/pnas.1708272114

Published
2024-12-08
How to Cite
Sholokhova, A. Y., & Matyushin, D. D. (2024). A comparison of published in 2018-2024 general-purpose models for predicting gas chromatographic retention indices. Sorbtsionnye I Khromatograficheskie Protsessy, 24(5), 711-722. https://doi.org/10.17308/sorpchrom.2024.24/12510