Empirical equations for the prediction of gas chromatographic retention indices for the DB-35MS stationary phase
Abstract
At the moment, most of the studies on the retention index prediction based on molecule structure are devoted to standard stationary phases: polydimethylsiloxane, 5%-phenyl-methylpolysiloxane, and polyethylene glycol. Retention index information for these stationary phases is contained in the NIST database, so a large training data set is available, and deep learning can be applied. This allows the creation of accurate and versatile retention index prediction models. However, other stationary phases are also actively used in research, for identification of components of complex mixtures using chromatography-mass spectrometry. The development of retention index prediction algorithms for these stationary phases could also be of great importance. In this paper, we consider the problem of predicting retention indices for the DB-35MS stationary phase (35%-phenyl-methylpolysiloxane). A data set of retention indices of 52 volatile organic compounds contained in lilac buds for this stationary phase is considered. Empirical equations are proposed that incorporate the retention index for the DB-5 stationary phase (5%-phenyl-methylpolysiloxane) predicted by deep learning and a number of molecular descriptors calculated using the RDKit framework. It was shown that the use of complex topological molecular descriptors, and features calculated using quantum chemistry does not provide a significant increase in accuracy compared to the simplest integer molecular descriptors, such as the number of bonds subject to internal rotation. At the same time, the use of the retention index for the DB-5 stationary phase predicted by deep learning as a molecular descriptor leads to a strong decrease in the prediction error compared to the use of only conventional molecular descriptors. When the retention indices predicted for the DB-624 stationary phase are used instead of the retention indices predicted for the DB-5 stationary phase, a relatively high prediction accuracy can also be achieved. Linear equations are presented that can be used in practice to calculate the retention indices of volatile plant compounds containing carbon, hydrogen, and oxygen for the DB-35MS stationary phase and similar stationary phases. A less accurate but more versatile equation is also presented that contains only the retention index predicted by deep learning for the DB-5 stationary phase as a molecular descriptor. The achieved values of the root-mean-square prediction error, the mean absolute prediction error, and the median absolute prediction error were 28.9, 19.3, and 11.8 units, respectively.
Downloads
References
Tarjan G., Nyiredy S., Györ M., Lombosi E. R., Lombosi T. S., Budahegyi M. V., Mészáros S.Y., Takács J. M., Thirtieth anniversary of the retention index according to Kováts in gas-liquid chromatography, Journal of Chromatography A, 1989; 472: 1-92. https://doi.org/10.1016/S0021-9673(00)94099-8
Sholokhova A.Yu., Borovikova S.A., Matyushin D.D., Buryak A.K., Identifikatsiya komponentov ozonirovannoi piroliznoi zhidkosti s ispol'zovaniem gazovoi khromato-mass-spektrometrii, ionnoi zhidkosti v kachestve nepodvizhnoi fazy i mashinnogo obucheniya, Sorbtsionnye i khromatograficheskie protsessy, 2022; 22(4): 413-420. https://doi.org/10.17308/sorpchrom.2022.22/10570 (In Russ.)
Kováts E., Gas‐chromatographische charakterisierung organischer verbin-dungen. Teil 1: retentions indices aliphatischer halogenide, alkohole, aldehyde und ketone, Helvetica Chimica Acta, 1958; 41(7): 1915-1932. https://doi.org/10.1002/hlca.19580410703
Van Den Dool H., Dec. Kratz P., A generalization of the retention index system including linear temperature programmed gas-liquid partition chromatography, Journal of Chromatography A, 1963; 11: 463-471. https://doi.org/10.1016/S0021-9673(01)80947-X
Karnaeva A.E., Sholokhova A.Yu., Validation of the identification reliability of known and assumed UDMH transformation products using gas chromatographic retention indices and machine learning, Chemosphere, 2024; 362: 142679. https://doi.org/10.1016/j.chemosphere.2024.142679
Khrisanfov M.D., Matyushin D.D., Samokhin A.S., A general procedure for finding potentially er-roneous entries in the database of retention indices, Analytica Chimica Acta, 2024; 1297: 342375. https://doi.org/10.1016/j.aca.2024.342375
Qu C., Schneider B. I., Kearsley A. J., Keyrouz W., Allison T. C., Predicting Kováts Retention Indices Using Graph Neural Networks, Journal of Chromatography A, 2021; 1646: 462100. https://doi.org/10.1016/j.chroma.2021.462100
Anjum A., Liigand J., Milford R., Gautam V., Wishart D. S., Accurate prediction of isothermal gas chromatographic Kováts retention indices, Journal of Chromatography A, 2023; 1705: 464176. https://doi.org/10.1016/j.chroma.2023.464176
Matyushin D.D., Sholokhova A.Yu., Buryak A.K., A deep convolutional neural network for the estimation of gas chromatographic retention indices, Journal of Chromatography A, 2019; 1607: 460395. https://doi.org/10.1016/j.chroma.2019.460395
Matyushin D.D., Sholokhova A.Yu., Buryak A.K., Deep learning based prediction of gas chromatographic retention indices for a wide variety of polar and mid-polar liquid stationary phases, International journal of molecular sciences, 2021: 22(17): 9194. https://doi.org/10.3390/ijms22179194
Sholokhova A.Yu., Matyushin D.D., Shashkov M.V., Quantitative structure-retention relationships for pyridinium-based ionic liquids used as gas chromatographic stationary phases: convenient software and assessment of reliability of the results, Journal of Chromatography A, 2024; 1730: 465144. https://doi.org/10.1016/j.chroma.2024.465144
Li M., Li R., Wang Z., Zhang Q., Bai H., Lv Q., Optimization of headspace for GC‐MS analysis of fragrance allergens in wooden children’s products using response surface methodology, Separation Science Plus, 2019; 2(1): 26-37. https://doi.org/10.1002/sscp.201800125
Evdokimova M.A., Onuchak L.A., Kuraeva Yu.G., Platonov V.I., Termodinamicheskie aspekty sorbtsii i razdeleniya enantiomerov nekotorykh monoterpenov na kapillyarnoi kolonke β-DEX 120, Sorbtsionnye i khromatograficheskie protsessy, 2015; 15(2): 288-300. (In Russ.)
Zhao C. X., Liang Y. Z., Fang H. Z., Li X. N., Temperature-programmed retention indices for gas chromatography–mass spectroscopy analysis of plant essential oils, Journal of Chromatography A, 2005; 1096(1-2): 76-85. https://doi.org/10.1016/j.chroma.2005.09.067
Volkova G.I., Zubarev D.A., Kadychagov P.B., Effect of Ultrasonic treatment on the properties and composition of High-Wax crude oil and its precipitates, Petroleum Chemistry, 2024; 1-8. https://doi.org/10.1134/S0965544124020026
Yu P., Banh R., Sohn A., Martis S., Biancur D., Yamamoto K., Lin E., Kimmelman A., Topographical investigation of metabolites in excised squares (TIMES2): Comprehensive cross-sectional metabolite quantification of pancreatic cancer in vivo, Cancer Research, 2024; 84(6_Supplement): 4440-4440. https://doi.org/10.1158/1538-7445.AM2024-4440
Hassanzadeh Z., Ebrahimi P., Kompany‐Zareh M., Ghavami R., Radial basis function neural networks based on projection pursuit approach and solvatochromic descriptors: single and full column prediction of gas chromatography retention behavior of polychlorinated biphenyls, Journal of Chemometrics, 2016; 30 (10): 589-601. https://doi.org/10.1002/cem.2822
Ghavami R., Sadeghi F., QSRR-based evaluating and predicting of the relative retention time of polychlorinated biphenyl congeners on 18 different high resolution GC columns, Chroma, 2009; 70(5-6): 851-868. https://doi.org/10.1365/s10337-009-1233-6
Zhao C. X., Zhang T., Liang Y. Z., Yuan D. L., Zeng Y. X., Xu Q. S., Conversion of programmed-temperature retention indices from one set of conditions to another, Journal of Chromatography A, 2007; 1144 (2): 245-254. https://doi.org/10.1016/j.chroma.2007.01.040
Li X., Luan F., Si H., Hu Z., Liu M., Prediction of retention times for a large set of pesticides or toxicants based on support vector machine and the heuristic method, Toxicology letters, 2007; 175(1-3): 136-144. https://doi.org/10.1016/j.toxlet.2007.10.005
Arruda A. C., Ampliação e aplicação do método semi-empírico topológico (IET) em modelos QSRR/QSPR/QSAR para compostos alifáticos halogenados e cicloalcanos, 2008. https://repositorio.ufsc.br/xmlui/handle/123456789/91111 (дата обращения: 27.07.2024)
Poole C. F., Qian J., Kiridena W., DeKay C., Koziol W. W., Evaluation of the separation characteristics of application-specific (volatile organic compounds) open-tubular columns for gas chromatography,Journal of Chromatography A, 2006; 1134(1-2): 284-290. https://doi.org/10.1016/j.chroma.2006.08.092
Zaitseva E. A., Obzor metodov klassifikatsii nepodvizhnykh faz v gazovoi khromatografii, Sorbtsionnye i khromatograficheskie protsessy, 2020; 20(2): 175-196. https://doi.org/10.17308/sorpchrom.2020.20/2772 (In Russ.)
Matyushin D. D., Sholokhova A. Y., Large-scale statistical study of the dependence of retention index on heating rate in temperature-programmed gas chromatography, Journal of Chromatography A, 2024; 1732: 465223. https://doi.org/10.1016/j.chroma.2024.465223
https://github.com/mtshn/chereshnya (дата обращения: 27.07.2024)
https://www.rdkit.org (дата обращения: 27.07.2024)
Pearlman R. S., Smith K. M., Metric validation and the receptor-relevant subspace concept, Journal of Chemical Information and Computer Sciences, 1999; 39(1): 28-35. https://doi.org/10.1021/ci980137x
Pearlman R. S., Smith K. M. Novel software tools for chemical diversity // 3D QSAR in Drug Design: Ligand-Protein Interactions and Molecular Similarity. Dordrecht: Springer Netherlands. 2002: 339-353. https://doi.org/10.1007/0-306-46857-3_18
Wildman S. A., Crippen G. M., Prediction of physicochemical parameters by atomic contributions, Journal of chemical information and computer sciences, 1999; 39(5): 868-873. https://doi.org/10.1021/ci990307l
Consonni V., Todeschini R., Molecular descriptors, Recent advances in QSAR studies: methods and applications, 2010; 29-102. https://doi.org/10.1007/978-1-4020-9783-6_3
Valiev M., Bylaska E.J., Govind N. Kowalski K., Straatsma T.P., Van Dam H.J.J., Wang D., Nieplocha J., Apra E., Windus T.L., de JongW.A., NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations, Computer Physics Communications, 2010; 181(9): 1477-1489. https://doi.org/10.1016/j.cpc.2010.04.018
Adamo C., Barone V., Toward reliable density functional methods without adjustable parameters: The PBE0 model, The Journal of chemical physics, 1999; 110(13): 6158-6170. https://doi.org/10.1063/1.478522
Yoshikawa N., Hutchison G. R., Fast, efficient fragment-based coordinate generation for Open Babel, Journal of cheminformatics, 2019; 11(1): 49. https://doi.org/10.1186/s13321-019-0372-5
Smith J. S., Nebgen B. T., Zubatyuk R., Lubbers N., Devereux C., Barros K., Roitberg A. E., Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning, Nature communications, 2019, 10(1): 2903. https://doi.org/10.1038/s41467-019-10827-4
Gao X., Ramezanghorbani F., Isayev O., Smith J. S., Roitberg A. E., TorchANI: a free and open source PyTorch-based deep learning implementation of the ANI neural network potentials, Journal of chemical information and modeling,2020; 60(7): 3408-3415. https://doi.org/10.1021/acs.jcim.0c00451
Zheng P., Yang W., Wu W., Isayev O., Dral P. O., Toward chemical accuracy in predicting enthalpies of formation with general-purpose data-driven methods, The Journal of Physical Chemistry Letters, 2022; 13(15): 3479-3491. https://doi.org/10.1021/acs.jpclett.2c00734
https://github.com/mtshn/svekla (дата обращения: 27.07.2024)
Rojas Villa C. X., Duchowicz P. R., Tripaldi P., Pis Diez R., Quantitative structure–property relationship analysis for the retention index of fragrance-like compounds on a polar stationary phase, Journal of Chromatography A, 2015; 1422: 277-288. https://doi.org/10.1016/j.chroma.2015.10.028
Qin L. T., Liu S. S., Chen F., Wu Q. S., Development of validated quantitative structure–retention relationship models for retention indices of plant essential oils, Journal of separation science, 2013; 36(9-10): 1553-1560. https://doi.org/10.1002/jssc.201300069
Dossin E., Martin E., Diana P., Castellon A., Monge A., Pospisil P., Guy P. A., Prediction models of retention indices for increased confidence in structural elucidation during complex matrix analysis: application to gas chromatography coupled with high-resolution mass spectrometry, Analytical chemistry, 2016; 88(15): 7539-7547. https://doi.org/10.1021/acs.analchem.6b00868
Rojas Villa C. X., Duchowicz P. R., Tripaldi P., Pis Diez R., Quantitative Structure-Property Relationships for Predicting the Retention Indices of Fragrances on Stationary Phases of Different Polarity, Anales de la Asociación Química Argentina, 2017; 104(2): 173-193. https://www.aqa.org.ar/images/anales/pdf104-2/104-2-abstracts.pdf
Yan J., Cao D. S., Guo F. Q., Zhang L. X., He M., Huang J. H., Liang Y. Z., Comparison of quantitative structure–retention relationship models on four stationary phases with different polarity for a diverse set of flavor compounds, Journal of Chromatography A, 2012; 1223: 118-125. https://doi.org/10.1016/j.chroma.2011.12.020
Geer L. Y., Stein S. E., Mallard W. G., Slotta D. J.AIRI: Predicting Retention Indices and Their Uncertainties Using Artificial Intelligence, Journal of Chemical Information and Modeling, 2024; 64(3): 690-696. https://doi.org/10.1021/acs.jcim.3c01758
Zenkevich I.G., Eliseenkov E.V., Kasatochkin A.N., Chromatographic identification of cyclohexane chlorination products by an additive scheme for the prediction of retention indices, Chromatographia, 2009; 70: 839-849. https://doi.org/10.1365/s10337-009-1213-x
Farkas O., Zenkevich I. G., Stout F., Kalivas J. H., Héberger K., Prediction of retention indices for identification of fatty acid methyl esters, Journal of Chromatography A, 2008; 1198: 188-195. https://doi.org/10.1016/j.chroma.2008.05.019
Zenkevich I.G., Pavlovskii A.A. Temperature dependence of gas chromatography retention indices as one of the main factors determining their interlaboratory reproducibility, Protection of metals and physical chemistry of surfaces, 2015; 51: 1058-1064. https://doi.org/10.1134/S2070205115060258
Wu L., Cho I. K., Li Y., Zhang G., Li Q. X., Evaluation of sources of irreproducibility of retention indices under programmed temperature gas chromatography conditions, Journal of Chromatography A, 2017; 1495: 57-63. https://doi.org/10.1016/j.chroma.2017.03.009