An information criterion for gradient boosted trees


Gradient boosting has been highly successful in machine-learning competitions for structured/tabular data since the introduction of XGBoost in 2014. Gradient boosting may be seen as a way of doing functional gradient descent to the supervised learning problem. As a consequence, in gradient tree boosting, the functional form of the model-ensemble constantly changes during training. To be able to choose the optimal functional complexity, the leading implementations offer a high number of regularization hyperparameters, available for manual tuning. This tuning typically require a combination of computationally costly cross validation on a grid of hyperparameters, coupled with some expert knowledge. To combat this, we propose an information criterion for gradient boosted trees, applicable to both the learning of the topology of trees, and as a stopping criterion for the boosting algorithm. This makes the algorithm adaptive to the dataset at hand; it is completely automatic and with minimal worries of overfitting. Moreover, as the algorithm only has to run once, the computational cost is drastically reduced.

University of Stavanger, Norway
Berent Å. S. Lunde
Senior Consultant | Adjunct Associate Professor