ogrisel

@ogrisel@sigmoid.social

Machine Learning Engineer at :probabl., scikit-learn core contributor. #Python, #Pydata, #MachineLearning & #DeepLearning.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

ogrisel, 9 months ago to random French

Intriguing paper: Provably Faster Gradient Descent via Long Steps by Benjamin Grimmer

The convergence rate of gradient descent on smooth convex objective functions can be improved by using a periodic learning rate pattern with some very large values:

https://arxiv.org/abs/2307.06324

Figure 1: Least squares problems minimizing ∥Ax − b∥2 2 (left) and ∥Ax − b∥2 2 + ∥x∥2 2 (right) with i.i.d. normal entries in A ∈ Rn×n and b ∈ Rn for n = 4000. Gradient Descent (1.3)’s objective gap is plotted over T = 2000 iterations with h = (1) and with each pattern from Table 1. Note this second objective is substantially more strongly convex, so its faster linear convergence is expected. Longer pattern periods with larger average step sizes lead to improved convergence for both problems.
Screenshot of table 1: Optimal step size patterns, for period t of 127, the largest step size is 370.0 and the constant in the convergence rate denominator is close to 5.83.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 9 months ago (edited 9 months ago)

@magsol not necessarily directly useful from a practical point of view, e.g. for logistic regression, in particular with correlated features and rare categorical values, second order methods are probably better while for deep transformers trained on datasets with millions of data points, stochastic solvers with momentum like Adam are probably more efficient. Yet it's interesting to see that cyclic learning rate schedules with very large learning rates can be theoretically justified.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

arthurzenika, 10 months ago to security French

Hier, en "pause tech" chez mon client, j'ai présenté quelques solutions matérielles pour faire de l'authentification multi facteurs (2FA/MFA/TOTP). J'ai parlé de yubikey, solokeys, titan keys. Et aussi des solutions logicielles: Authenticator, FreeOTP, LastPass, etc.

Coté applications qui permette l'usage de cette bonne pratique de sécurité, j'ai découvert https://www.dongleauth.com/

Vous utilisez quoi vous ?

#security #2fa #mfa

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 10 months ago

@arthurzenika Un combo FreeOTP + yubikeys. J ai pas testé les autres.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 10 months ago (edited 10 months ago) to random

Jérémie has just released threadpoolctl 3.2.0:

https://pypi.org/project/threadpoolctl/

This is a small Python library to inspect and change the size of the threadpools used by libraries dynamically linked to a Python program (e.g. OpenBLAS, MKL, OpenMP runtimes...).

It is quite useful to debug oversubscription problems in the #SciPy / #PyData ecosystem.

This new version makes it possible to register a custom controller for your own native library. See the changelog for details:

https://github.com/joblib/threadpoolctl/blob/master/CHANGES.md

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ogrisel

ogrisel, 11 months ago (edited 11 months ago) to ArtificialIntelligence French

LongNet: Scaling Transformers to 1,000,000,000 Tokens

https://arxiv.org/abs/2307.02486

#deeplearning #transformers

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 11 months ago

@val It's not clear which model size is used to produce the right hand side figure nor if they kept a fixed context size to produce the left hand side figure.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 11 months ago

I tried to edit the above post to add the missing ALT text to the second screenshot but sigmoid.social & elk do not take the change into account, so here it is:

Figure 7: Left: Test loss of LONGNET with an increasing model size. The scaling curve follows a similar law to the vanilla Transformers. Right: Test loss of LONGNET using different context windows. A longer context window yields better language modeling.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 11 months ago (edited 11 months ago) to random

joblib 1.3.0 is out in the wild!

joblib is a library that provides an generic way to call into thread-based, process-based and distributed parallelism (via external backends) + a way to cache expensive computation in repeated function calls on disk.

https://joblib.readthedocs.io

This new release provides several major new features, inclusing a return_as="generator" argument to the Parallelclass to make it possible to aggregate parallel results when ready (preserving the submission order).

1/4

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ericholscher, jorisvandenbossche, GaelVaroquaux

ogrisel, 11 months ago (edited 11 months ago)

In the future this will also be extended to return_as="unordered_generator" to optionally make it possible to aggregate results as soon as ready.

This release also includes a new parallel_config context manager as an extension to parallel_backend to make it possible to configure all the arguments of the Parallel class and not just the backend using a context manager idiom.

Detailed changelog:
https://github.com/joblib/joblib/blob/master/CHANGES.rst#release-130----20230628

2/4

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

ogrisel, 11 months ago (edited 11 months ago)

As a side benefit of this refactoring, the traceback of an exception raised in sequential mode (n_jobs=1) is now flatter.

3/4

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 11 months ago (edited 11 months ago)

And thanks to everybody involved in making this happen, and Thomas as the release manager in particular.

4/4

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

ogrisel, 11 months ago (edited 11 months ago)

@rupdecat I think that at the time I preferred the side-effect free design of cloudpickle if I remember correctly.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...