Gradient boosting for the prediction of gas chromatographic retention indices
Abstract
The estimation of gas chromatographic retention indices based on compounds structures is an important
problem. Predicted retention indices can be used in a mass spectral library search for the identification
of unknowns. Various machine learning methods are used for this task, but methods based on decision
trees, in particular gradient boosting, are not used widely. The aim of this work is to examine the usability of
this method for the retention index prediction. 177 molecular descriptors computed with Chemistry Development Kit are used as the input representation of a molecule. Random subsets of the whole NIST 17 database are used as training, test and validation sets. 8000 trees with 6 leaves each are used. A neural network with one hidden layer (90 hidden nodes) is used for the comparison. The same data sets and the set of descriptors are used for the neural network and gradient boosting. The model based on gradient boosting outperforms the neural network with one hidden layer for subsets of NIST 17 and for the set of essential oils.
The performance of this model is comparable or better than performance of other modern retention prediction models. The average relative deviation is ~3.0%, the median relative deviation is ~1.7% for subsets of NIST 17. The median absolute deviation is ~34 retention index units. Only non-polar liquid stationary phases (such as polydimethylsiloxane, 5% phenyl 95% polydimethylsiloxane, squalane) are considered. Errors obtained with different machine learning algorithms and with the same representation of the molecule strongly correlate with each other.
Downloads
References
2. Zhang J., Koo I., Wang B., Gao Q.W. et al., J. Chromatogr. A, 2012, Vol. 1251, pp. 188-193, DOI: 10.1016/j.chroma.2012.06.036
3. Available at: https://chemdata.nist.gov/ (accessed 06 Nov 2019).
4. Buryak A.K., Russ. Chem. Rev., 2002, Vol. 71, No 8, pp. 695-706, DOI:10.1070/RC2002v071n08ABEH000711
5. Matyushin D.D., Buryak A.K., Sorbtionnye I khromatograficheskie protsessy, 2017, Vol.17, No 2, pp. 204-211, DOI:10.17308/sorpchrom.2017.17/372
6. Matyushin D.D., Buryak A.K., J. Anal. Chem., 2019, Vol. 74, Supplement 1, pp. 47-51, DOI: 10.1134/S1061934819070165.
7. Heberger K., J. Chromatogr. A, 2007, Vol. 1158, No 1-2, pp. 273-305, DOI:10.1016/j.chroma.2007.03.108
8. Yap C.W., J. Comput. Chem., 2011, Vol. 32, No 7, pp. 1466-1474, DOI:10.1002/jcc.21707
9. Matyushin D.D., Sholokhova A.Yu., Buryak A.K., J. Chromatogr. A, 2019, Vol. 1607, pp. 460395, DOI:10.1016/j.chroma.2019.460395
10. Rojas C., Duchowicz P.R., Tripaldi P., Diez R.P., Chemom. Intell. Lab. Syst., 2015, Vol. 140, pp. 126-132, DOI:10.1016/j.chemolab.2014.09.020
11. Kumari S., Stevens D., Kind T., Denkert C. et al., Anal. Chem., 2011, Vol. 83, No 15, pp. 5895–5902, DOI: 10.1021/ac2006137
12. Chen H.F., Anal. Chim. Acta, 2008, Vol. 609, No 1, pp. 24-36, DOI:10.1016/j.aca.2008.01.003
13. Mikhaleva V.V., Verhoeven H.A., De Vos R.C.H., van Ham R.C., Bioinformatics, 2009, Vol. 25, No 6, pp. 787-794, DOI:
10.1093/bioinformatics/btp056
14. Dossin E., Martin E., Diana P., Castellon A. et al., Anal. Chem., 2016, Vol. 88, No. 15, pp. 7539–7547, DOI:10.1021/acs.analchem.6b00868
15. Qiu F., Lei Z., Sumner L.W., Anal. Chim. Acta, 2018, Vol. 1037, pp. 316-326, DOI:10.1016/j.aca.2018.03.052
16. Roe B.P., Yang H.-J., Zhu J., Liu Y. et al.,Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers,
Detectors and Associated Equipment, 2005, Vol. 543, No 2-3, pp. 577-584, DOI:10.1016/j.nima.2004.12.018
17. Natekin A., Knoll A., Frontiers in neurorobotics, 2013, Vol. 7, pp. 21, DOI:10.3389/fnbot.2013.00021
18. Available at: https://haifengl.github.io/ (accessed 28 Nov 2019).
19. Jennings W., Qualitative Analysis of Flavor and Fragrance Volatiles by Glass Capillary Gas Chromatography, London, Academic Press,
INC, 1980, 472 p.
20. Adams R.P., Identification of Essential Oil Components by Gas Chromatography – Mass Spectrometry, 4th edition, USA, Allured publishing
corporation, Carol Stream, 2007, Vol. 456, 804 p.
21. Willighagen E.L., Mayfield J.W., Alvarsson J., Berg A. et al., J. Cheminformatics, 2017, Vol. 9, No 1, p. 33, DOI: 10.1186/s13321-017-0220-4
22. Available at: http://deeplearning4j.org (accessed 06 Nov 2019).