Gradient boosting for the prediction of gas chromatographic retention indices

Authors

  • Dmitriy D. Matyushin A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow
  • Anastasia Yu. Sholokhova A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow
  • Aleksey K. Buryak A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow

DOI:

https://doi.org/10.17308/sorpchrom.2019.19/2223

Keywords:

gas chromatography, retention index, machine learning, gradient boosting.

Abstract

The estimation of gas chromatographic retention indices based on compounds structures is an important
problem. Predicted retention indices can be used in a mass spectral library search for the identification
of unknowns. Various machine learning methods are used for this task, but methods based on decision
trees, in particular gradient boosting, are not used widely. The aim of this work is to examine the usability of
this method for the retention index prediction. 177 molecular descriptors computed with Chemistry Development Kit are used as the input representation of a molecule. Random subsets of the whole NIST 17 database are used as training, test and validation sets. 8000 trees with 6 leaves each are used. A neural network with one hidden layer (90 hidden nodes) is used for the comparison. The same data sets and the set of descriptors are used for the neural network and gradient boosting. The model based on gradient boosting outperforms the neural network with one hidden layer for subsets of NIST 17 and for the set of essential oils.
The performance of this model is comparable or better than performance of other modern retention prediction models. The average relative deviation is ~3.0%, the median relative deviation is ~1.7% for subsets of NIST 17. The median absolute deviation is ~34 retention index units. Only non-polar liquid stationary phases (such as polydimethylsiloxane, 5% phenyl 95% polydimethylsiloxane, squalane) are considered. Errors obtained with different machine learning algorithms and with the same representation of the molecule strongly correlate with each other.

Downloads

Download data is not yet available.

Author Biographies

  • Dmitriy D. Matyushin, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow

    junior researcher, laboratory of physicochemical principles of chromatography and chromatography – mass
    spectrometry; Institute of Physical Chemistry and Electrochemistry, Moscow, e-mail: dm.matiushin@mail.ru

  • Anastasia Yu. Sholokhova, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow

    junior researcher, laboratory of physicochemical principles of chromatography and chromatography – mass
    spectrometry; Institute of Physical Chemistry and Electrochemistry, Moscow, e-mail: shonastya@yandex.ru

  • Aleksey K. Buryak, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow

    prof., grand PhD (chemistry), laboratory of physicochemical principles of chromatography and chromatography
    – mass spectrometry; Institute of Physical Chemistry and Electrochemistry, Moscow, e-mail: akburyak@mail.ru

References

Downloads

Published

2019-12-05

How to Cite

Gradient boosting for the prediction of gas chromatographic retention indices. (2019). Sorbtsionnye I Khromatograficheskie Protsessy, 19(6), 630-635. https://doi.org/10.17308/sorpchrom.2019.19/2223

Most read articles by the same author(s)

<< < 1 2