GaelVaroquaux

@GaelVaroquaux@mastodon.social

Research & code: Research director @inria
► Machine Learning, Data, Health, & Computer science
►Python coder, (co)founder of scikit_learn
& joblib
►Physics PhD

#MachineLearning #Python #OpenSource

This profile is from a federated server and may be incomplete. Browse more on the original instance.

GaelVaroquaux, to random

Software systems, more than any other engineering activity, create a technological world that results from social dynamics and constructs.
This is because the space of possibilities is much wider, and there are many more objects interacting than in other industrial endeavors.

Big thinkers of urban planning, designing spaces and cities accounting for interactions connected their thinking with sociology and related.

People thinking software at the ecosystem level probably should do the same.

image/png

straphanger, to Finland
@straphanger@urbanists.social avatar

In Oulu, , temperatures will be down to -20 C this week.

Yet 12% of all journeys in the city will be by bicycle.

How do they do it? Good wayfinding for starters, with symbols projected onto the snow...❄️ 🚲 🧵

video/mp4

GaelVaroquaux,

@josephsalmon @straphanger that's nice indeed

GaelVaroquaux, to random

Avec la #LoiImmigration, le gouvernement manie la xénophobie, et veut inscrire la discrimination dans la loi.

C'est le programme de l'extrême droite, un programme de division et non de construction, un programme qui met notre démocratie sur une pente dangereuse.

GaelVaroquaux, to random

Une interview sur scikit-learn : la vision du projet, comment penser à l'impact, au lien avec la société, à la dynamique open-source... 45mn où je parle de ce qui nous motive, de ce que nous avons appris sur les données et l'humain...

https://www.youtube.com/watch?v=I5RoWUyJgT8

Ce fut un grand plaisir, merci beaucoup à l'équipe, hymaïa dont Yoann Benoit.
Je me rends compte que j'ai une meilleure énonciation en français 🙂

GaelVaroquaux,

Mais ce soir, si vous êtes francophone, il y a plus important: le naufrage inquiétant de notre démocratie qui se doit de réunir et de construire:
https://mastodon.social/@GaelVaroquaux/111609337085524577

catalystcoop, to random
@catalystcoop@mastodon.energy avatar

A thread from @GaelVaroquaux looking at the impact of the community-driven sklearn compared to centralized corporate ML packages. Community isn't always fast or easy, but it can be very robust over the long term once it's established.

"People underestimate how impactful @sklearn continues to be" — @fchollet

https://twitter.com/GaelVaroquaux/status/1734629067322753239

GaelVaroquaux,

@davidbrochart @catalystcoop @sklearn @fchollet yes, I agree 100%.

This is an argument that I think is becoming increasingly important. Sasha Luccioni has very convincing numbers that the footprint of AI is becoming a problem.

GaelVaroquaux,

@pybonacci @davidbrochart @catalystcoop @sklearn @fchollet she has done several excellent papers, but her last one has some very interesting numbers: https://arxiv.org/abs/2311.16863

GaelVaroquaux, to random

Join us: this is open source, and the power of such a project is the ability to build in common.

Let's create together a much-needed tool for data science
https://github.com/skrub-data/skrub/

GaelVaroquaux, to random

Skrub is very young, and there is a lot more that needs done.

For instance, we want to support multiple dataframe backends and lazy modes.

Our dream is to streamline developing and put in production machine-learning by coupling the scikit_learn API to database operations.
7/8

GaelVaroquaux, to random

🎉First release of skrub 0.1.0 http://skrub-data.org

Couple dataframes and databases to machine learning to facilitate data prep

✨Less data wrangling, more machine learning✨

This is a young project that I am very excited about:
🧵👇
1/8

GaelVaroquaux,

Highlight: TableVectorizer sophisticated dataframes encoding (string, dates...)

Gives very strong baseline for learning in particular coupled with gradient boosting
https://skrub-data.org/stable/generated/skrub.TableVectorizer.html

I cannot work without it these days
2/8

GaelVaroquaux,

Bigger picture: skrub will enable assembling full data processing pipelines across multiple tables that can be cross-validated with scikit_learn and one day put in production:

Joining, Aggregation, transformation to build models directly from the original tables and database
3/8

GaelVaroquaux,

Lower level: skrub provides pandas-like functions to facilitate assembly of "dirty data"

For instance the fuzzy_join function (https://skrub-data.org/stable/generated/skrub.fuzzy_join.html), can perform merges across tables with imperfect correspondance - below "Laos" is matched to "Lao PDR"
4/8

GaelVaroquaux,

Likewise, skrub.to_datetime (https://skrub-data.org/stable/generated/skrub.to_datetime.html) takes a complete dataframe, tries to detect which columns are dates or time and converts them to the Datetime type:
5/8

GaelVaroquaux,

Each functionality comes as a scikit_learn
transformer:
Joiner (https://skrub-data.org/stable/generated/skrub.Joiner.html), DateTimeEncoder (https://skrub-data.org/stable/generated/skrub.DatetimeEncoder.html)

Separate "fit" and "transform" avoid prediction-time problems.
They enable hyper-parameter tuning (eg adding a "day of the week column")
6/8

image/png

GaelVaroquaux, to random

🎉 Tool for better documentation!! Release of sphinx-gallery, to automatically integrate narrative 🐍 examples in documentations
https://sphinx-gallery.github.io/stable/index.html

Highlight: a light recommender system to show related examples

An illustration of sphinx-gallery:
https://scikit-learn.org/dev/auto_examples/inspection/plot_linear_model_coefficient_interpretation.html
(from @sklearn 's gallery). Note the links to function docs.

Sphinx-gallery comes with awesome features such as
◼online execution with binder or jupyterlite
◼mini-galleries eg to link an object's docstring to its examples

A screenshot of a long example in scikit-learn documentation discussion interpretation of features in linear models
Part of scikit-learn's gallery of examples
Examples on a given object (here scikit-learn's TransformTargetRegressor) linked at the end of the object's docstring

GaelVaroquaux, to statistics

Sampling bias in practice: conducting a survey on the Paris metro platform...

if you ask people where they get off, you'll get a different distribution depending on where on the platform you stand: people choose their position close to the exits at arrival.

#statistics #epidemiology #socialscience

image/jpeg

GaelVaroquaux, to random

I’ll be giving the online lecture on "Representation learning on relational data to automate data preparation" on November 15th, 7pm EEST at AIHouse Ukraine.

Join the lecture, learn and support Ukraine
https://aihouse.org.ua/en/ai-for-ukraine/

GaelVaroquaux, to random

📑 "healthwashing": verb [ I or T ]
to make people believe that your computer-science grant or paper is about trying to improve health, while it really is an excuse to do maths and maybe you have a few biomedical signals on a thumb drive
⚕️💻

GaelVaroquaux,

@cazencott it's a step in the right direction.

Even better if you have an idea of the public-health stakes: what can be done to actually make a difference on patients' life.

I find research just applying canonical CS questions to biomedical data without heakth thinking a bit boring, I must say.

MathieuP, to random French
@MathieuP@mastodon.gougere.fr avatar

Question #velo
Cela fait deux fois en six ans que je dois changer ma roue arrière à cause d'un axe cassé. Je roule tous les jours, mais peu (7 km en environnement urbain, donc avec des franchissements de bordures un peu abrupts). Je ne suis pas très lourd (autour de 67 kg), mais je mets mon sac (moins de 10 kg) dans une sacoche à l'arrière.
Cela vous semble normal comme casse, où il y a un problème ?

GaelVaroquaux,

@MathieuP à mon avis, tu as des franchissements un peu brutaux (trottoirs sans bateaux par exemple). Depuis que j'évite ça, je ne casse plus de moyeux.

GaelVaroquaux, to random

🤖 I am honored to have been appointed to the government-level panel of experts on AI 🇫🇷.

We are tasked with suggesting a national vision and strategy in France.
The panel is made of experts on different topics: economics, law, computer science, from academia, industry, non-profits
https://gael-varoquaux.info/science/comite-de-lintelligence-artificielle-vision-et-strategie-nationale.html

GaelVaroquaux,

@morenonatural that's not the point. Nobody (me in particular) should see this as a reward, but as a duty 😀

GaelVaroquaux,

@morenonatural and thanks, by the way 😊

GaelVaroquaux, to random

✨Slides on causal inference: Individualizing treatment effects — transportability and model selection

https://speakerdeck.com/gaelvaroquaux/individualizing-treatment-effects-transportability-and-model-selection

Selecting models for causal inference, choice of variables for best bias-variance tradeoff, and choice of a relevant summary statistics (not risk difference for binary outcomes)

Presented at #ECMLPKDD2023

image/png
image/png
image/png

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • magazineikmin
  • Youngstown
  • khanakhh
  • ngwrru68w68
  • slotface
  • ethstaker
  • mdbf
  • everett
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • cisconetworking
  • rosin
  • JUstTest
  • Durango
  • GTA5RPClips
  • anitta
  • tester
  • tacticalgear
  • InstantRegret
  • normalnudes
  • osvaldo12
  • cubers
  • provamag3
  • modclub
  • Leos
  • lostlight
  • All magazines