#DataScience - kbin.social

ramikrispin, 7 days ago to python

(1/5) 𝐇𝐚𝐩𝐩𝐲 𝐒𝐚𝐭𝐮𝐫𝐝𝐚𝐲! ☀️
Here are a few steps you can take to reduce your Python 🐍 image size 👇🏼

TLDR - Using slim image and multi-stage build

#mlops #python #datascience #docker

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ thomas_mock

ramikrispin, 6 days ago to datascience

DevOps for Data Science - New Book 🚀

Always happy to see new MLOps books! The DevOps for Data Science is a new book by Alex K Gold. As the name implies, the book focuses on topics related to DevOps for data scientists. This includes the following:
✅ Command line
✅ Working with Linux systems
✅ Docker
✅ Scaling resources
✅ Network, domains, DNS, SSL, etc.
✅ Authentication

#DataScience #mlops #devops

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ underdarkGIS

ramikrispin, 5 days ago to datascience

Building a GPT-2 from scratch 🚀

Andrej Karpathy released today a tutorial for reproducing GPT-2 from scratch. OpenAI released GPT -2 in 2019, and it is a 124M parameters model. This four-hour tutorial covers setting up the GTP-2 network and then training and optimizing its parameters.

It looks like a really cool tutorial; I hope to get the bandwidth to watch it in the coming weeks!

📽️ https://www.youtube.com/watch?v=l8pRSuU81PU

#DataScience #MachineLearning #python #deeplearning #gpt #AI

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 13 days ago to ArtificialIntelligence

(1/2) Congratulations to my friend Lior and his co-author Meysam for the release of their new book - Mastering NLP from Foundations to LLMs 🎉

I met Lior a few years ago at a conference, and since then, I have been following his work in the field of NLP ❤️.

#nlp #python #machinelearning #deeplearning #DataScience #LLM

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

4nn4_clickt, 8 days ago to llm German

Wie kann #KI Museen dabei helfen, Sammlungen zu erschließen? Sebastian Ruff vom Stadtmuseum Berlin erzählt von seinen Erfahrungen mit Tools zur automatischen Schlagwortgenerierung. Sein Fazit: die Arbeit damit kostet erstmal Zeit & die Tools haben ihre Grenzen, aber sie haben Potenzial. Wichtig am Anfang: vollständige Thesauri mit Normdaten & Fehlstellenanalyse im Datenbestand ☝️#Datenqualität
https://www.kultur-b-digital.de/digitale-kultur/impulse/ki-im-stadtmuseum-interview-mit-sebastian-ruff/

#LLM #DataScience #AI #CulturalHeritage #ChatGPT #llava #Bilderkennung @museum

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

leanpub, 8 days ago to datascience

The Hundred-Page Machine Learning Book (PDF + EPUB + extra PDF formats) by Andriy Burkov is on sale on Leanpub! Its suggested price is $40.00; get it for $14.00 with this coupon: https://leanpub.com/sh/F07t1Azi #DataScience #ComputerScience #MachineLearning #Ai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

physaliacourses2, 9 days ago to datascience

🔍 Want to make your data analysis fully reproducible with R?

🚀 Join us this October for the 3rd edition of our course with @paocorrales and @eliocamp !

Don't miss out! 🔗https://physalia-courses.org/courses-workshops/r-reproducibility/

#rstats #datascience #reproducibility

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 9 days ago to python

Posit recently released a new Shiny extension for VScode, supporting both Shiny for R and Python 🚀

More details on the release post 👇🏼
https://shiny.posit.co/blog/posts/shiny-vscode-1.0.0/

Extension 🔗: https://marketplace.visualstudio.com/items?itemName=Posit.shiny

#rstats #python #shiny #DataScience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

leanpub, 10 days ago to datascience

The Hundred-Page Machine Learning Book (PDF + EPUB + extra PDF formats) by Andriy Burkov is on sale on Leanpub! Its suggested price is $40.00; get it for $14.00 with this coupon: https://leanpub.com/sh/RIsQReL4 #DataScience #ComputerScience #MachineLearning #Ai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 10 days ago to datascience

Learn how to split strings and get the first element in R using base R, stringi, and stringr. Check out my latest post for examples and tips. Give it a try and share your experiences!

#Rstats #DataScience #StringManipulation #R #RProgramming #stringr #stringi #strings #Programming #Data

Post: https://www.spsanderson.com/steveondata/posts/2024-06-05/

image/png

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rpodcast, 10 days ago to datascience

Episode 167 of the @rstats @rweekly Highlights Podcast is a full (not partial) match with great R content! https://serve.podhome.fm/episodepage/r-weekly-highlights/issue-2024-w23-highlights

🛠️ Compa-tibble functions @grusonh
🏫 R tutorial worksheets with Quarto @nrennie

We're loving the ways we can add modern features to this show. Once you grab a new podcast app from https://newpodcastapps.com, you can see them in their full glory!

h/t @mike_thomas @jonmcalder 🙏

#RStats #DataScience #V4V

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

OpenDataLu, 10 days ago to datascience French

Vous êtes data scientist et vous travaillez pour le secteur public ? Le dernier guide de bonnes pratiques du Ministère de la digitalisation vous est destiné !

https://mindigital.gouvernement.lu/fr/publications/guide-manuel/guide-data-scientists-fr.html

N'hésitez pas à publier le résultats de vos analyses sur data.public.lu si vous le pouvez, ou à inclure des données déjà disponibles sur en open data dans vos analyses.

#dataScience #openData #Luxembourg #researchLuxembourg

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sebkrantz, 16 days ago to datascience

In the development version of {collapse} [v2.0.15, available via install.packages("collapse", repos = "https://fastverse.r-universe.dev")], the pivot() function has received a FUN argument to support aggregation, including a number or hard-coded internal functions that do this "on the fly". Initial benchmarks show that this significantly outperforms other pivot table options in R. More at https://sebkrantz.github.io/collapse/reference/pivot.html (feel free to test and give feedback). #rcollapse #rstats #DataScience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ devSJR

ramikrispin, 11 days ago to machinelearning

(1/2) I am excited to present at the useR!2024 conference on July 2nd!

I am going to run a virtual workshop about deployment and monitoring data and ML pipelines using free and open-source tools. This includes setting pipelines using GitHub Actions, Docker 🐳, R, and Quarto 🚀.

When 📆: July 2nd at 10 AM PST

#Rstats #MachineLearning #DataScience #MLops

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

moorejh, 11 days ago to LLMs

Our KRAGEN paper is out! This method combines LLMs & RAG with Graph of Thoughts for asking complex questions of a knowledge graph or any vector DB https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btae353/7687047 #llms #artificialintelligence #bioinformatics #datascience

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kellogh

Posit, 11 days ago to python

posit::conf(2024) virtual tickets are now available!
Join us on August 12-14—from all over the world—to live stream the incredible talks and keynotes that will be taking place in Seattle.

We understand that not everyone will be able to make the trip to Seattle this year, so we’re excited to offer a fully virtual offering for everyone as an alternate option.
REGISTER: https://posit.co/conference/

#posit #rstats #python #pydata #DataScience

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ grimalkina

news, 11 days ago to ai

AI-Weekly for Tuesday, June 4, 2024 - Issue 115
https://ai-weekly.ai/newsletter-06-04-2024/

The Week's News in Artificial Intelligence
A Mind Vault Solutions, Ltd. Publication
#ai #news #ainews #artificialintelligence #aiweekly #technology #tech #technews #techtrends #machinelearning #robotics #datascience #airesearch #futuretech

Subscribers: 22,694 Opt-In Subscribers were sent this issue via email.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

leanpub, 12 days ago to python

The course Dirty Data Dojo: Cleaning Data (Excel & Python) by Lee Baker is on sale on Leanpub! Its suggested price is $119.00; get it for $49.50 with this coupon: https://leanpub.com/sh/fwmKxsmd #Python #BusinessAnalysis #DataScience #Science #SocialScience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 15 days ago to opensource

I am excited to present at the Dev AI conference in Paris on June 19!

I am going to run a workshop about the deployment and monitoring of ML pipelines with free and open-source tools. This includes using tools such as GitHub Actions and Pages, Docker, Python, Quarto, etc.

More details are available on the conference website👇🏼
https://events.linuxfoundation.org/ai-dev-europe/

Thanks to the Linux Foundation and the conference organizers for the invite!

#opensource #docker #github #MachineLearning #DataScience #python

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FelipeSMBarros, 15 days ago to python Portuguese

🚀 Anúncio: Nova Versão do Módulo Python crossfire!

A nova versão do módulo Python crossfire, desenvolvida por mim e @cuducos está disponível!

✨ Novidades:

Bug corrigido: Agora compatível com Google Colab!
Funcionalidade extra: Parâmetro que desempacota dados aninhados para facilitar a análise.
Ideal para jornalistas de dados e analistas! Cadastre-se na API do Fogo Cruzado e acesse os dados direto no Python.

#Python #DataScience #DataDrivenJournalism #JornalismoDeDados #OpenSource

Mapa da região de recife apresentando pontos indicando a localização de tiroteios e os motivos daods mesmos, como "Ataques a civis", "Ação Policial", entre outros.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ villares

leanpub, 15 days ago to datascience

The Hundred-Page Machine Learning Book (PDF + EPUB + extra PDF formats) by Andriy Burkov is on sale on Leanpub! Its suggested price is $40.00; get it for $14.00 with this coupon: https://leanpub.com/sh/HEQaRVfD #DataScience #ComputerScience #MachineLearning #Ai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

stevensanderson, 15 days ago to datascience

🚀 TidyDensity's New AIC Functions! 🚀

The TidyDensity package now includes new functions to calculate the Akaike Information Criterion (AIC) for various distributions, streamlining model quality assessment. Use functions like util_negative_binomial_aic() to automate AIC calculations, ensuring precise model evaluation.

Happy coding!

#RStats #DataScience #TidyDensity #Programming #RProgramming #Coding

Post: https://www.spsanderson.com/steveondata/posts/2024-05-31/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

RConsortium, 16 days ago to HR

🐘✨ Great news from Marcela Victoria Soto at R4HR in Buenos Aires! She recently shared updates about their dynamic activities: "Data analysis is crucial for agile decision-making in companies." Join them on June 1, 2024, for the "Data Visualization in HR" event. Perfect for Spanish-speaking R users interested in HR analytics. 📅👥 Read more: https://www.r-consortium.org/blog/2024/05/30/r4hr-in-buenos-aires-leveraging-r-for-dynamic-hr-solutions

#RStats #HR #DataScience #Meetup

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mwfc

telescoper.blog, 16 days ago to ai

Before I head off on a trip to various parts of not-Barcelona, I thought I’d share a somewhat provocative paper by David Hogg and Soledad Villar. In my capacity as journal editor over the past few years I’ve noticed that there has been a phenomenal increase in astrophysics papers discussing applications of various forms of Machine Leaning (ML). This paper looks into issues around the use of ML not just in astrophysics but elsewhere in the natural sciences.

The abstract reads:

Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology – in which only the data exist – and a strong epistemology – in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here, we identify some locations for ML in the natural sciences at which the ontology and epistemology are valuable. For example, when an expressive machine learning model is used in a causal inference to represent the effects of confounders, such as foregrounds, backgrounds, or instrument calibration parameters, the model capacity and loose philosophy of ML can make the results more trustworthy. We also show that there are contexts in which the introduction of ML introduces strong, unwanted statistical biases. For one, when ML models are used to emulate physical (or first-principles) simulations, they introduce strong confirmation biases. For another, when expressive regressions are used to label datasets, those labels cannot be used in downstream joint or ensemble analyses without taking on uncontrolled biases. The question in the title is being asked of all of the natural sciences; that is, we are calling on the scientific communities to take a step back and consider the role and value of ML in their fields; the (partial) answers we give here come from the particular perspective of physics

arXiv:2405.18095

P.S. The answer to the question posed in the title is probably “yes”.

https://telescoper.blog/2024/05/30/is-machine-learning-good-or-bad-for-the-natural-sciences/

#AI #ArtificialIntelligence #arXiv240518095 #Astrophysics #Cosmology #DataScience #deepLearning #MachineLearning

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ubi

ramikrispin, 16 days ago to datascience

DuckDB can now read data from Hugging Face via the hf:// prefix 👇🏼

https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb

#data #duckdb #DataScience #huggingface

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stevensanderson