#PyData - kbin.social

ogrisel, 4 months ago to python

I have been thinking a bit about how to detect supply chain attacks against popular open source projects such as scikit-learn.

If you have practical experience with https://reproducible-builds.org/ in particular in the #Python / #PyData ecosystem, I would be curious about any feedback to the plan I suggest for scikit-learn in the following issue.

Feel free to reply on mastodon first, if you have questions.

https://github.com/scikit-learn/scikit-learn/issues/28151

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ sethmlarson, leahawasser, underdarkGIS

R4DSCommunity, 4 months ago to random

Upcoming #NotJustR4DS book clubs:

Today:

:rstats: https://r4ds.io/ggplot2

:rstats: https://r4ds.io/r4ds

Tomorrow:

:python: https://r4ds.io/py4da

Sunday:

:python: https://r4ds.io/islp

🆕 :rstats: :python: https://r4ds.io/ema

:rstats: https://r4ds.io/mshiny

:rstats: https://r4ds.io/r4ds

Monday:

:rstats: https://r4ds.io/islr

:rstats: https://r4ds.io/rpkgs

Join our Slack at https://r4ds.io/join to learn more!

#RStats #PyData #RShiny

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mwfc, hfrick

kedro, 5 months ago to python
🎉 We are thrilled to announce that the long-awaited Kedro 0.19 is available! 🔶

Install it now with pip or conda:
pip install kedro==0.19.1  
conda/mamba/micromamba install -c conda-forge kedro=0.19.1  
This is a summary of what you'll find: 🧵

#python #kedro #pydata #datascience
reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ogrisel, 5 months ago (edited 5 months ago) to random

I ran a quick Gradient Boosted Trees vs Neural Nets check using scikit-learn's dev branch which makes it more convenient to work with tabular datasets with mixed numerical and categorical features data (e.g. the Adult Census dataset).

Let's start with the GBRT model. It's now possible to reproduce the SOTA number of this dataset in a few lines of code 2 s (CV included) on my laptop.

1/n

#sklearn #PyData #MachineLearning #TabularData #GradientBoosting #DeepLearning #Python

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

PyDataPGH, 5 months ago to python

Did you know the biggest #Python conference in the world is coming to #Pittsburgh?

Join #PyData Pittsburgh for an online event with @pycon and @ThePSF about how folks in the local community can get involved!

https://buff.ly/3Rookg6

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mariatta, ThePSF

ogrisel, 5 months ago (edited 5 months ago) to machinelearning

Today with @dholzmueller we explored the possibility to reduce a probabilistic regression problem to a classification problem by binning the target variable and interpolating the conditional CDF estimated by classifier.predict_proba(X_test).cumsum(axis=1) to the original continuous range.

Here is a notebook with the results of my experiments:

https://nbviewer.org/github/ogrisel/notebooks/blob/master/quantile_regression_as_classification.ipynb

#PyData #MachineLearning

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

R4DSCommunity, 5 months ago to random

On this #GivingTuesday, we're asking for your help prioritizing our backlog! Visit https://r4ds.io/donate.html to read about our upcoming projects, and vote with your tax-deductible donation!
#GivingSeason #TidyTuesday #RStats #PyData #JuliaLang

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ meghansharris

joranelias, 6 months ago to random

I don’t know why the general workflow of

Do 90% of the processing & visualization in R

Do the one thing I really need Python for via {reticulate} by just sending it the exact dataframe it needs and sending the results back to R for post-processing

Hadn’t occurred to me until recently, but I am really, REALLY liking it.

#rstats #pydata

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ robinlovelace

ogrisel, 7 months ago (edited 7 months ago) to python

cloudpickle 3.0.0 is out!

https://github.com/cloudpipe/cloudpickle

cloudpickle is a library used by PySpark, Dask, Ray and joblib / loky (among others) to make it possible to call dynamically or locally defined function, closures and lambdas on remote Python process workers.

This is typically necessary for running code in parallel on a distributed computing cluster from an interactive developer environment such as a Jupyter or Databricks notebooks.

#Python #PyData #HPC #DistributedComputing

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

R4DSCommunity, 7 months ago to chat

It's #TidyTuesday y'all! Show us what you made on our Slack at https://r4ds.io/join (find the #chat-tidytuesday channel)!

RT @jonthegeek https://fosstodon.org/@jonthegeek/111165673314464975

#RStats #DataViz #PyData #tidyverse #r4ds

Please consider a tax-deductible donation at https://r4ds.io/donate to support our work!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ janeadams

CD_Newton, 7 months ago to python

Anyone in the UK enthusiastic about dogs, #Rstats, #Python, #Pydata, and looking for a new job? There’s a Data Officer role going on my team if so! Interesting work and a nice bunch of people.

https://careers.dogstrust.org.uk/vacancies/4048/data_officer/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ urswilke

sklearn, 7 months ago to python

🔴 19 fixes
😃 74 contributors
📢 Bugfix release - scikit-learn 1.3.1 is out!
More details in the changelog: https://bit.ly/3rpA33J

You can upgrade with pip as usual:
pip install -U scikit-learn

or using the conda-forge builds:
conda install -c conda-forge scikit-learn

Thanks to all the contributors!

#data #Python #software #ML #opensource #pydata #scipy #sklearn #machinelearning

video/mp4

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

R4DSCommunity, 7 months ago to chat

It's #TidyTuesday y'all! Show us what you made on our Slack at https://r4ds.io/join (find the #chat-tidytuesday channel)!

RT @jonthegeek https://fosstodon.org/@jonthegeek/111126027712361401

#RStats #DataViz #PyData #tidyverse #r4ds

Please consider a tax-deductible donation at https://r4ds.io/donate to support our work!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ meghansharris

astrojuanlu, 7 months ago to python

Struggling with out-of-order cells on Jupyter? No more:

pip install ipyflow

https://github.com/ipyflow/ipyflow

#python #jupyter #datascience #pydata

Screencast of Jupyter cells being updated with cell dependency markers (small coloured dots) and cell execution hints (coloured bars) being shown

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jonny

ogrisel, 7 months ago to python

scikit-learn 1.3.1 is out!

This release fixes a bunch of annoying bugs. Here is the changelog:

https://scikit-learn.org/stable/whats_new/v1.3.html#version-1-3-1

Thanks very much to all bug reporters, PR authors and reviewers and thanks in particular to @glemaitre, the release manager of 1.3.1.

#PyData #SciPy #sklearn #Python #machinelearning

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ GaelVaroquaux

jonthegeek, 7 months ago to random

#PositConf2023 is over 😭

Soon I'll buy my Super-Fan tickets for #PositConf2024 in Seattle (not available quite yet as far as I can find), but first it's time for one more thread to summarize my threads! Each post in this thread will be flagged with a titled "content warning" to make it easier to find your way back to the top, I hope that works out!

If you appreciate these threads, please consider a tax-deductible donation to @R4DSCommunity at https://r4ds.io/donate!

#RStats #PyData

🧵1 of x

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

kellybodwin, 8 months ago to datascience

An #introduction since I just server hopped!

I'm an Associate Professor of Statistics and Data Science at Cal Poly. I teach #datascience with #pydata and #rstats and #machinelearning

I am the proud owner of over 300 board games, way too many Osprey hiking/backpacking bags, and at least one bruise from Roller Derby.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ europlus, wx1g, yaitorr, paezha

hylk3, 8 months ago to random Dutch

@cheukting_ho thanks for your great handson session on building strategies for tests of dataframes. i loved it 🥰! #pydata #pydataamsterdam

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ cheukting_ho

astrojuanlu, 8 months ago to python

Arrived to #PyDataAmsterdam2023 !

#PyData #python #PyDataAmsterdam @pydataamsterdam

reply

expand (12)

collapse (12)

report

activity

copy /kbin url

copy original url

open original url

Loading...

astrojuanlu, 8 months ago

@pydataamsterdam

The one an only James Powell @dontusethsicode with “the most boring talk title in only three words”

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

astrojuanlu, 8 months ago

@pydataamsterdam

Spotting what I call the "Jim Downling classification of data pipelines" in this promising talk by Hopsworks

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

astrojuanlu, 8 months ago

@pydataamsterdam So excited to see the Thomas Wolf and more from the Hugging Face 🤗 giving a promising closing keynote! Just this Monday I was working with some colleagues on a HF + @kedro integration that hopefully will go open source soon.

"2000+ models in the Hub are private"

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

Speaker under a screen with a slide titled "The case of 'Open-Source' AI?"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

astrojuanlu, 8 months ago

@pydataamsterdam "Choice of training data is the most important [part] of an LLM!"

Data quality improvements "can be equivalent to a 2x-3x increase in size"

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

astrojuanlu, 8 months ago

@pydataamsterdam

Impressive results from Hugging Face: proper filtering of web data can match or exceed performance of commercial models trained on highly curated datasets.

Dataset: https://huggingface.co/datasets/tiiuae/falcon-refinedweb
Paper: https://doi.org/10.48550/arXiv.2306.01116

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

astrojuanlu, 8 months ago

@pydataamsterdam Thomas kind of dodged my question on the enforceability of OpenRAIL 😇 so happy that they exist anyway, it's a conversation we need to have.

(RAIL = Responsible AI Licenses)

#PyDataAmsterdam2023 #PyDataAmsterdam #PyData #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

astrojuanlu, 8 months ago

@pydataamsterdam

Last day of #PyDataAmsterdam2023: the mighty @koaning giving the final keynote: Natural Intelligence is All You Need

#PyDataAmsterdam #PyData #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oliverandrich