#GradientBoosting - Posts

ErikJonker, 2 months ago to ai Dutch

AI & beer 😃 (no LLMs involved)
https://www.nature.com/articles/s41467-024-46346-0
#ai #beer #gradientboosting

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 6 months ago (edited 6 months ago) to random

I ran a quick Gradient Boosted Trees vs Neural Nets check using scikit-learn's dev branch which makes it more convenient to work with tabular datasets with mixed numerical and categorical features data (e.g. the Adult Census dataset).

Let's start with the GBRT model. It's now possible to reproduce the SOTA number of this dataset in a few lines of code 2 s (CV included) on my laptop.

1/n

#sklearn #PyData #MachineLearning #TabularData #GradientBoosting #DeepLearning #Python

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

ogrisel, 6 months ago (edited 6 months ago)

For neural networks, feature preprocessing is a deal breaker.

I was pleasantly surprised to observe that by intuitively composing basic building blocks (OneHotEncoder and SplineTransformer and MLPClassifier) from scikit-learn, it's possible to approach the predictive performance of trees on this dataset.

2/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

ogrisel, 6 months ago

Note that the runtime for the neural net is ~10x slower than the tree-based model on my Apple M1 laptop.

I did not try to use an expensive GPU with PyTorch.

Note however that I did configure conda-forge's numpy to link against Apple Accelerate and use the M1 chip builtin GPU which is typically around 3x faster than OpenBLAS on the M1's CPU.

It's possible that with float32 ops (instead of float64) the difference would be wider though. Unfortunately it's not yet easy to do w/ sklearn.

3/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux