astrojuanlu, to python
@astrojuanlu@social.juanlu.space avatar
ogrisel, (edited ) to random
@ogrisel@sigmoid.social avatar

I ran a quick Gradient Boosted Trees vs Neural Nets check using scikit-learn's dev branch which makes it more convenient to work with tabular datasets with mixed numerical and categorical features data (e.g. the Adult Census dataset).

Let's start with the GBRT model. It's now possible to reproduce the SOTA number of this dataset in a few lines of code 2 s (CV included) on my laptop.

1/n

#sklearn #PyData #MachineLearning #TabularData #GradientBoosting #DeepLearning #Python

ogrisel, (edited ) to machinelearning
@ogrisel@sigmoid.social avatar

Today with @dholzmueller we explored the possibility to reduce a probabilistic regression problem to a classification problem by binning the target variable and interpolating the conditional CDF estimated by classifier.predict_proba(X_test).cumsum(axis=1) to the original continuous range.

Here is a notebook with the results of my experiments:

https://nbviewer.org/github/ogrisel/notebooks/blob/master/quantile_regression_as_classification.ipynb

#PyData #MachineLearning

jonthegeek, to random
@jonthegeek@fosstodon.org avatar

#PositConf2023 is over 😭

Soon I'll buy my Super-Fan tickets for #PositConf2024 in Seattle (not available quite yet as far as I can find), but first it's time for one more thread to summarize my threads! Each post in this thread will be flagged with a titled "content warning" to make it easier to find your way back to the top, I hope that works out!

If you appreciate these threads, please consider a tax-deductible donation to @R4DSCommunity at https://r4ds.io/donate!

#RStats #PyData

🧵1 of x

ogrisel, to python
@ogrisel@sigmoid.social avatar

I have been thinking a bit about how to detect supply chain attacks against popular open source projects such as scikit-learn.

If you have practical experience with https://reproducible-builds.org/ in particular in the #Python / #PyData ecosystem, I would be curious about any feedback to the plan I suggest for scikit-learn in the following issue.

Feel free to reply on mastodon first, if you have questions.

https://github.com/scikit-learn/scikit-learn/issues/28151

kedro, to python

🎉 We are thrilled to announce that the long-awaited Kedro 0.19 is available! 🔶

Install it now with pip or conda:

pip install kedro==0.19.1  
conda/mamba/micromamba install -c conda-forge kedro=0.19.1  

This is a summary of what you'll find: 🧵

#python #kedro #pydata #datascience

Posit, to random
@Posit@fosstodon.org avatar

Python Data Science at posit::conf(2023). We are excited about all our Python workshops are posit::conf this year.

posit::conf(2023) is our conference for all things open source data science. Join us in Chicago Sept 17-20. With two days of workshops, and two days of talks and community. Learn more at pos.it/conf.

See below!

#positconf2023 #posit #pydata

cheukting_ho, to random
@cheukting_ho@fosstodon.org avatar

Back to PyData London #pydata Meetup

Posit, to random
@Posit@fosstodon.org avatar

Use Polars? Find options for displaying tables limiting? Check out great_tables latest update

Right now the Polars ecosystem has few good options for styling tables for presentation. Enter great_tables.

https://posit-dev.github.io/great-tables/blog/polars-styling/

#python #pydata #posit #great_tables

ansate, to datascience
@ansate@social.coop avatar

Attempt at networking post!

Hi! I would love to meet and talk to more people in #dataScience #statistics #stats #pydata #dataViz etc etc

I have a Phd in #appliedMath and I work in #productAnalytics

I attend my local American Statistical Assoc meetings sometimes, but they are rare.

I also helped start an R User Group once, (but haven't used R in more than 10 years), and ran a Big Data reading group in #pdx

Please boost 😀

ogrisel, to python
@ogrisel@sigmoid.social avatar

scikit-learn 1.3.1 is out!

This release fixes a bunch of annoying bugs. Here is the changelog:

https://scikit-learn.org/stable/whats_new/v1.3.html#version-1-3-1

Thanks very much to all bug reporters, PR authors and reviewers and thanks in particular to @glemaitre, the release manager of 1.3.1.

jonthegeek, to datascience
@jonthegeek@fosstodon.org avatar

🎂It's my birthday!🎂
To celebrate, I'm... Working to build a friendly, diverse #DataScience community at https://r4ds.io, just like I do every day! It'd make my day if you supported our efforts at https://r4ds.io/donate !

#RStats #PyData #JuliaLang #DataViz

ogrisel, (edited ) to python
@ogrisel@sigmoid.social avatar

cloudpickle 3.0.0 is out!

https://github.com/cloudpipe/cloudpickle

cloudpickle is a library used by PySpark, Dask, Ray and joblib / loky (among others) to make it possible to call dynamically or locally defined function, closures and lambdas on remote Python process workers.

This is typically necessary for running code in parallel on a distributed computing cluster from an interactive developer environment such as a Jupyter or Databricks notebooks.

#Python #PyData #HPC #DistributedComputing

jonthegeek, to javascript
@jonthegeek@fosstodon.org avatar

I'm extremely saddened to read that Women Who Code is closing (https://womenwhocode.com/blog/the-end-of-an-era-women-who-code-closing). My heart goes out to everyone impacted by this situation, and everyone who would have been impacted by their initiatives.

We can't replace them, but we welcome anyone looking for a friendly, inclusive community to join us at the Data Science Learning Community (@DSLC) https://DSLC.io

nihilistdatascientist, to datascience
@nihilistdatascientist@mastodon.social avatar

Once I learned that SQL is usually case-insensitive I decided to write all my SQL in SpongeMock, because nothing fucking matters:

seLeCt
cOl2
,CoL1
fROm mY_tAblE
wHeRe cOl1 iS nOt nUlL
aNd CoL2 = 23

#sql #rstats #pydata #DataScience

PyDataPGH, to Pittsburgh
@PyDataPGH@fosstodon.org avatar

#PyConUS 2024 in #Pittsburgh is a wrap! We're delighted to welcome the more than 50 new members who signed up for #PyData Pittsburgh during #PyCon.

https://news.pypgh.org/p/pycon-us-2024-is-in-the-books

Many thanks to @mariatta, @lorenipsum, the rest of the @pycon organizing team and @ThePSF staff, and everyone else who made PyCon US in Pittsburgh possible and awesome. See you again in 2025!

A scene from the PyData at PyCon Happy Hour
A keynote talk at PyCon US 2024

pydatamadrid, to python Spanish
@pydatamadrid@masto.ai avatar

Ya está abierto el registro para nuestra reunión de abril: 🐲 LLMOps & ML para Drilling Performance y Python & Mazmorras, este mes en las oficinas de Repsol

https://www.meetup.com/pydata-madrid/events/300310880/

¡Nos vemos el jueves 18 a las 19:00! Y después, networking 🍻

Posit, to python
@Posit@fosstodon.org avatar

Great Tables v0.4.0 is out! Featuring nanoplots.

great_tables is a python package for making nice-looking data display tables.

New features:

  • the fmt_nanoplot() method for adding nanoplots to your table

  • improved HTML table representations in different code environments

  • integration of Polars selectors in the columns= arg of all formatting (fmt_*()) methods

  • the save() method for saving a GT table as an image file

https://posit-dev.github.io/great-tables/blog/introduction-0.4.0/

#python #datascience #greattables #pydata

gmcd, to random
@gmcd@mastodon.social avatar

In a little over two weeks, I'll be giving a streamed workshop on @duckdb, Polars and friends. https://sites.google.com/view/dariia-mykhailyshyna/main/r-workshops-for-ukraine#h.xc2x33lbfxln

I've given several internal versions of this workshop at Amazon and I daresay it's been very well received. The power of these new data wrangling libraries is honestly staggering. We use them all the time at work. You should too.

20 bucks gets you in the door. All proceeds to Ukraine aid orgs. #rstats #pydata

Posit, to datascience
@Posit@fosstodon.org avatar

While data scientists are often taught about training a machine learning model, building a reliable MLOps strategy to deploy and maintain that model can be daunting.

It doesn’t have to be this way!

Join us with Julia Silge at Posit on Wednesday, April 24th at 11 am ET to learn how Posit Team provides fluent tooling for the whole ML lifecycle.

No registration is required to attend - simply add it to your calendar using this link, https://pos.it/team-demo

Posit, to datascience
@Posit@fosstodon.org avatar
Posit, to datascience
@Posit@fosstodon.org avatar

While data scientists are often taught about training an ML model, building a reliable MLOps strategy to deploy & maintain that model can be daunting.

It doesn’t have to be this way!

  1. Develop an ML model using Posit Workbench and Tidy Tuesday dataset!
  2. Version, deploy, & monitor that model w/ Posit Connect
  3. Maintain reproducible software dependencies throughout the ML lifecycle with Posit Package Manager

https://www.youtube.com/watch?v=FZW_0HB-Eas&list=PL9HYL-VRX0oRsUB5AgNMQuKuHPpNDLBVt&index=1&ab_channel=PositPBC

PyDataPGH, to Pittsburgh
@PyDataPGH@fosstodon.org avatar

Join for a casual gathering of the local, national, and international PyData community on the sidelines of US 2024! Meet up with fellow , , and scientific computing enthusiasts when the world's largest Python conference comes to town.

https://www.meetup.com/pydata-pittsburgh/events/300961938/

Posit, to python
@Posit@fosstodon.org avatar

Check out Dr. Albert Rapp's latest YouTube video on mastering the great_tables Python package! From raw data to polished displays, learn about custom fonts, nanoplots, conditional formatting, and the steps to great a lovely looking data display table with great_tables.
https://www.youtube.com/watch?v=ESyWcOFuMQc&ab_channel=AlbertRapp

Don't miss his companion video on gt with R and the insightful blog post comparing both.
https://albert-rapp.de/posts/22_gt_py_and_r/22_gt_py_and_r

Thanks, Albert! #Python #DataVisualization #pydata #posit

Posit, to python
@Posit@fosstodon.org avatar

Join Posit at PyCon US 2024! The Posit team will be in Pittsburg for PyCon May 17-24.

  • Rich Iannone & Mike Chow will be presenting on “Making Beautiful, Publication Quality Tables in Python is Possible in 2024“ https://us.pycon.org/2024/schedule/presentation/65/
  • Stop by our booth! We’ll have folks from our open source and enterprise teams excited to hear about you & your work.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • provamag3
  • magazineikmin
  • Youngstown
  • osvaldo12
  • khanakhh
  • slotface
  • tacticalgear
  • InstantRegret
  • ngwrru68w68
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • everett
  • rosin
  • JUstTest
  • Durango
  • GTA5RPClips
  • ethstaker
  • modclub
  • mdbf
  • cisconetworking
  • Leos
  • normalnudes
  • cubers
  • megavids
  • tester
  • anitta
  • lostlight
  • All magazines