ogrisel, to python
@ogrisel@sigmoid.social avatar

I have been thinking a bit about how to detect supply chain attacks against popular open source projects such as scikit-learn.

If you have practical experience with https://reproducible-builds.org/ in particular in the #Python / #PyData ecosystem, I would be curious about any feedback to the plan I suggest for scikit-learn in the following issue.

Feel free to reply on mastodon first, if you have questions.

https://github.com/scikit-learn/scikit-learn/issues/28151

R4DSCommunity, to random
@R4DSCommunity@fosstodon.org avatar

Upcoming book clubs:

Today:

Tomorrow:

Sunday:

Monday:

Join our Slack at https://r4ds.io/join to learn more!

kedro, to python

🎉 We are thrilled to announce that the long-awaited Kedro 0.19 is available! 🔶

Install it now with pip or conda:

pip install kedro==0.19.1  
conda/mamba/micromamba install -c conda-forge kedro=0.19.1  

This is a summary of what you'll find: 🧵

ogrisel, (edited ) to random
@ogrisel@sigmoid.social avatar

I ran a quick Gradient Boosted Trees vs Neural Nets check using scikit-learn's dev branch which makes it more convenient to work with tabular datasets with mixed numerical and categorical features data (e.g. the Adult Census dataset).

Let's start with the GBRT model. It's now possible to reproduce the SOTA number of this dataset in a few lines of code 2 s (CV included) on my laptop.

1/n

#sklearn #PyData #MachineLearning #TabularData #GradientBoosting #DeepLearning #Python

PyDataPGH, to python
@PyDataPGH@fosstodon.org avatar

Did you know the biggest #Python conference in the world is coming to #Pittsburgh?

Join #PyData Pittsburgh for an online event with @pycon and @ThePSF about how folks in the local community can get involved!

https://buff.ly/3Rookg6

ogrisel, (edited ) to machinelearning
@ogrisel@sigmoid.social avatar

Today with @dholzmueller we explored the possibility to reduce a probabilistic regression problem to a classification problem by binning the target variable and interpolating the conditional CDF estimated by classifier.predict_proba(X_test).cumsum(axis=1) to the original continuous range.

Here is a notebook with the results of my experiments:

https://nbviewer.org/github/ogrisel/notebooks/blob/master/quantile_regression_as_classification.ipynb

#PyData #MachineLearning

R4DSCommunity, to random
@R4DSCommunity@fosstodon.org avatar

On this , we're asking for your help prioritizing our backlog! Visit https://r4ds.io/donate.html to read about our upcoming projects, and vote with your tax-deductible donation!

joranelias, to random
@joranelias@mastodon.social avatar

I don’t know why the general workflow of

  1. Do 90% of the processing & visualization in R
  2. Do the one thing I really need Python for via {reticulate} by just sending it the exact dataframe it needs and sending the results back to R for post-processing

Hadn’t occurred to me until recently, but I am really, REALLY liking it.

ogrisel, (edited ) to python
@ogrisel@sigmoid.social avatar

cloudpickle 3.0.0 is out!

https://github.com/cloudpipe/cloudpickle

cloudpickle is a library used by PySpark, Dask, Ray and joblib / loky (among others) to make it possible to call dynamically or locally defined function, closures and lambdas on remote Python process workers.

This is typically necessary for running code in parallel on a distributed computing cluster from an interactive developer environment such as a Jupyter or Databricks notebooks.

#Python #PyData #HPC #DistributedComputing

R4DSCommunity, to chat
@R4DSCommunity@fosstodon.org avatar

It's y'all! Show us what you made on our Slack at https://r4ds.io/join (find the -tidytuesday channel)!

RT @jonthegeek https://fosstodon.org/@jonthegeek/111165673314464975

Please consider a tax-deductible donation at https://r4ds.io/donate to support our work!

CD_Newton, to python
@CD_Newton@fosstodon.org avatar

Anyone in the UK enthusiastic about dogs, #Rstats, #Python, #Pydata, and looking for a new job? There’s a Data Officer role going on my team if so! Interesting work and a nice bunch of people.

https://careers.dogstrust.org.uk/vacancies/4048/data_officer/

sklearn, to python
@sklearn@fosstodon.org avatar

🔴 19 fixes
😃 74 contributors
📢 Bugfix release - scikit-learn 1.3.1 is out!
More details in the changelog: https://bit.ly/3rpA33J

You can upgrade with pip as usual:
pip install -U scikit-learn

or using the conda-forge builds:
conda install -c conda-forge scikit-learn

Thanks to all the contributors!

video/mp4

R4DSCommunity, to chat
@R4DSCommunity@fosstodon.org avatar

It's y'all! Show us what you made on our Slack at https://r4ds.io/join (find the -tidytuesday channel)!

RT @jonthegeek https://fosstodon.org/@jonthegeek/111126027712361401

Please consider a tax-deductible donation at https://r4ds.io/donate to support our work!

astrojuanlu, to python
@astrojuanlu@social.juanlu.space avatar
ogrisel, to python
@ogrisel@sigmoid.social avatar

scikit-learn 1.3.1 is out!

This release fixes a bunch of annoying bugs. Here is the changelog:

https://scikit-learn.org/stable/whats_new/v1.3.html#version-1-3-1

Thanks very much to all bug reporters, PR authors and reviewers and thanks in particular to @glemaitre, the release manager of 1.3.1.

jonthegeek, to random
@jonthegeek@fosstodon.org avatar

#PositConf2023 is over 😭

Soon I'll buy my Super-Fan tickets for #PositConf2024 in Seattle (not available quite yet as far as I can find), but first it's time for one more thread to summarize my threads! Each post in this thread will be flagged with a titled "content warning" to make it easier to find your way back to the top, I hope that works out!

If you appreciate these threads, please consider a tax-deductible donation to @R4DSCommunity at https://r4ds.io/donate!

#RStats #PyData

🧵1 of x

kellybodwin, to datascience

An since I just server hopped!

I'm an Associate Professor of Statistics and Data Science at Cal Poly. I teach with and and

I am the proud owner of over 300 board games, way too many Osprey hiking/backpacking bags, and at least one bruise from Roller Derby.

hylk3, to random Dutch
@hylk3@mastodon.nl avatar

@cheukting_ho thanks for your great handson session on building strategies for tests of dataframes. i loved it 🥰! #pydata #pydataamsterdam

astrojuanlu, to python
@astrojuanlu@social.juanlu.space avatar
astrojuanlu,
@astrojuanlu@social.juanlu.space avatar

@pydataamsterdam

The one an only James Powell @dontusethsicode with “the most boring talk title in only three words”

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

astrojuanlu,
@astrojuanlu@social.juanlu.space avatar

@pydataamsterdam

Spotting what I call the "Jim Downling classification of data pipelines" in this promising talk by Hopsworks

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

astrojuanlu,
@astrojuanlu@social.juanlu.space avatar

@pydataamsterdam So excited to see the Thomas Wolf and more from the Hugging Face 🤗 giving a promising closing keynote! Just this Monday I was working with some colleagues on a HF + @kedro integration that hopefully will go open source soon.

"2000+ models in the Hub are private"

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

Speaker under a screen with a slide titled "The case of 'Open-Source' AI?"

astrojuanlu,
@astrojuanlu@social.juanlu.space avatar

@pydataamsterdam "Choice of training data is the most important [part] of an LLM!"

Data quality improvements "can be equivalent to a 2x-3x increase in size"

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

astrojuanlu,
@astrojuanlu@social.juanlu.space avatar

@pydataamsterdam

Impressive results from Hugging Face: proper filtering of web data can match or exceed performance of commercial models trained on highly curated datasets.

Dataset: https://huggingface.co/datasets/tiiuae/falcon-refinedweb
Paper: https://doi.org/10.48550/arXiv.2306.01116

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

astrojuanlu,
@astrojuanlu@social.juanlu.space avatar

@pydataamsterdam Thomas kind of dodged my question on the enforceability of OpenRAIL 😇 so happy that they exist anyway, it's a conversation we need to have.

(RAIL = Responsible AI Licenses)

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

astrojuanlu,
@astrojuanlu@social.juanlu.space avatar

@pydataamsterdam

Last day of #PyDataAmsterdam2023: the mighty @koaning giving the final keynote: Natural Intelligence is All You Need

#PyDataAmsterdam #PyData #python

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • khanakhh
  • mdbf
  • InstantRegret
  • Durango
  • Youngstown
  • rosin
  • slotface
  • thenastyranch
  • osvaldo12
  • ngwrru68w68
  • kavyap
  • cisconetworking
  • DreamBathrooms
  • megavids
  • magazineikmin
  • cubers
  • vwfavf
  • modclub
  • everett
  • ethstaker
  • normalnudes
  • tacticalgear
  • tester
  • provamag3
  • GTA5RPClips
  • Leos
  • JUstTest
  • All magazines