ErikJonker, to ai Dutch
@ErikJonker@mastodon.social avatar
ogrisel, (edited ) to random
@ogrisel@sigmoid.social avatar

I ran a quick Gradient Boosted Trees vs Neural Nets check using scikit-learn's dev branch which makes it more convenient to work with tabular datasets with mixed numerical and categorical features data (e.g. the Adult Census dataset).

Let's start with the GBRT model. It's now possible to reproduce the SOTA number of this dataset in a few lines of code 2 s (CV included) on my laptop.

1/n

#sklearn #PyData #MachineLearning #TabularData #GradientBoosting #DeepLearning #Python

ogrisel, (edited )
@ogrisel@sigmoid.social avatar

For neural networks, feature preprocessing is a deal breaker.

I was pleasantly surprised to observe that by intuitively composing basic building blocks (OneHotEncoder and SplineTransformer and MLPClassifier) from scikit-learn, it's possible to approach the predictive performance of trees on this dataset.

2/n

ogrisel,
@ogrisel@sigmoid.social avatar

Note that the runtime for the neural net is ~10x slower than the tree-based model on my Apple M1 laptop.

I did not try to use an expensive GPU with PyTorch.

Note however that I did configure conda-forge's numpy to link against Apple Accelerate and use the M1 chip builtin GPU which is typically around 3x faster than OpenBLAS on the M1's CPU.

It's possible that with float32 ops (instead of float64) the difference would be wider though. Unfortunately it's not yet easy to do w/ sklearn.

3/n

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • mdbf
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • JUstTest
  • GTA5RPClips
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • Leos
  • cisconetworking
  • provamag3
  • lostlight
  • All magazines