@ramikrispin@mstdn.social
@ramikrispin@mstdn.social avatar

ramikrispin

@ramikrispin@mstdn.social

Data science and engineering senior manager at ο£Ώ | #rstats & #Python | πŸ“¦ dev | ❀️ time-series analysis & forecasting | Author. Opinions are my own | https://linktr.ee/ramikrispin

This profile is from a federated server and may be incomplete. Browse more on the original instance.

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

I spent last night to build this fun Shinylive app - Forecasting Sandbox 😎

The app provides a simple sandbox for three simple forecasting models - Linear regression, ARIMA, and Holt-Winters, and it entirely runs on the browser!

I am planning to deploy it to Github Actions and create a tutorial (WIP) πŸ‘‡πŸΌ

https://github.com/RamiKrispin/shinylive-r

video/mp4

ramikrispin, to random
@ramikrispin@mstdn.social avatar

The output of the π’‘π’Œπ’ˆ_𝒅𝒆𝒑𝒔_𝒕𝒓𝒆𝒆 function from the 𝐩𝐚𝐀 package is priceless ❀️.

The pak package provides tools to install packages in R while handling dependencies and is an alternative to the π’Šπ’π’”π’•π’‚π’π’.π’‘π’‚π’„π’Œπ’‚π’ˆπ’†π’”() and 𝒅𝒆𝒗𝒕𝒐𝒐𝒍𝒔::π’Šπ’π’”π’•π’‚π’π’_π’ˆπ’Šπ’•π’‰π’–π’ƒ() functions.

Documentation: https://pak.r-lib.org/index.html
Source code: https://github.com/r-lib/pak/

#rstats

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

A Hackers' Guide to Language Models πŸ‘‡πŸΌ

Jeremy Howard's keynote about LLMs was one of the most interesting talks at the #positconf2023 and I highly recommend watching his talk. The talk focuses on the landscape and an overview of #LLM, particularly #GPT4. Jeremy provides some cool tricks and uses cases of LLMs. I love his example about sending a prompt with your own #Python functions and asking it to use it.

Resources πŸ“š
Video: https://www.youtube.com/watch?v=jkrNMKz9pWU
Code and notebooks: https://github.com/fastai/lm-hackers

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

Build from Scratch a GPT Tokenizer πŸš€

This is a great tutorial by Andrej Karpathy for building a GPT tokenizer from scratch with Python. This two-hour tutorial covers the following topics:
βœ… Introduction to tokenization
βœ… Handling different types of strings with Python
βœ… Tokenizer implementation
βœ… Decoding and encoding tokens and strings
βœ… Train new tokens

Tutorial πŸ“½οΈ: https://www.youtube.com/watch?v=zduSFxRajkE

#llm #python #nlp #datascience

ramikrispin, to Excel
@ramikrispin@mstdn.social avatar

(1/2) I have been following the work of @stevensanderson and David Kum for a few years now, and I am excited to see the release of their new book πŸ₯³- Extending Excel with Python and R πŸš€.

The book focuses on the common conjunction and collaboration between data scientists and Excel users. This includes scaling and automating #Excel tasks with #RStats and #Python and core data science applications such as data wrangling, working with APIs, data visualization, and modeling.

#DataScience

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

DuckDB πŸ¦† + dplyr πŸ”§= duckplyr πŸš€πŸš€πŸš€

DuckDB released a new R package - duckplyr, which enables running dplyr functions using the DuckDB engine on the backend ❀️. The package, on the backend, translates and maps the dplyr code into DuckDB. This will enable dplyr users to work with large datasets with higher performance.

Resources πŸ“š
Code: https://github.com/duckdblabs/duckplyr
Documentation: https://duckdblabs.github.io/duckplyr/
Release post: https://duckdb.org/2024/04/02/duckplyr

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

DBRX - a new general-purpose, open-source large language model (LLM) from Databricks.

More details πŸ‘‡πŸΌ
https://www.databricks.com/blog/announcing-dbrx-new-standard-efficient-open-source-customizable-llms

Resources πŸ“š
Hugging Face πŸ€—: https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962
Code πŸ”—: https://github.com/databricks/dbrx

#DataScience #llm #deeplearning #machinelearning

ramikrispin, to python
@ramikrispin@mstdn.social avatar

(1/2) Moirai - Salesforce's Foundation Forecasting Model πŸš€

Salesforce recently released Moirari - a new #Python 🐍 library with a foundation model for time series forecasting applications. According to the release blog - the model comes with universal forecasting capabilities and can handle multiple scenarios and different frequencies.

#data #DataScience #llm #timeseries #forecasting #machinelearning #deeplearning

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

DevOps for Data Science - New Book πŸš€

Always happy to see new MLOps books! The DevOps for Data Science is a new book by Alex K Gold. As the name implies, the book focuses on topics related to DevOps for data scientists. This includes the following:
βœ… Command line
βœ… Working with Linux systems
βœ… Docker
βœ… Scaling resources
βœ… Network, domains, DNS, SSL, etc.
βœ… Authentication

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

New release to Ollama πŸŽ‰

A major release to Ollama - version 0.1.32 is out. The new version includes:
βœ… Improvement of the GPU utilization and memory management to increase performance and reduce error rate
βœ… Increase performance on Mac by scheduling large models between GPU and CPU
βœ… Introduce native AI support in Supabase edge functions

More details on the release notes πŸ‘‡πŸΌ
https://github.com/ollama/ollama/releases

Image credit: release notes

#DataScience #MachineLearning #llm #ollama #llama #python

ramikrispin, to llm
@ramikrispin@mstdn.social avatar

In case you are wondering, the new Microsoft mini LLM - phi3, can handle code generation, in this case, SQL.

I compared the runtime (locally on CPU) with respect to codellama:7B using Ollama, and surprisingly the Phi3 runtime was significantly slower.

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

Gradient Descent Visualization πŸ‘‡πŸΌ

I was looking for examples of interactive data visualization for a gradient descent algorithm, and I found this app by Lili Jiang. This desktop app is based on C++ and enables simulation and visualization of different gradient descent algorithms, such as momentum, AdaGrad, RMSProp, and Adam. The app enables to compare different methods simultaneously.

https://github.com/lilipads/gradient_descent_viz

Image credit: App repository

video/mp4

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

(1/2) Google released a new foundation model for time series forecasting πŸš€

The TimeFM (Time Series Foundation Model) is a foundation model for time series forecasting applications. This pre-trained model was developed by the Google Research team. It joins the recent trend of leveraging foundation models for time series forecasting, which includes Salesforce's Moirai and Amazon's Chronos.

#DataScience #forecasting #llm #deeplearning #MachineLearning #python #timeseries

image/png

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

(1/2) Shiny Apps for demystifying statistical models and methods πŸš€

This is a cool website that explains different statistical concepts with the use of interactive Shiny Apps. Ben Prytherch made this website from the Department of Statistics at Colorado State University.

video/mp4

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

Lessons in Statistical Thinking - New Book πŸ“šπŸ‘‡πŸΌ

The Lessons in Statistical Thinking is a new book by Prof. Daniel Kaplan focusing on statistical reasoning. The book covers the following topics:
βœ… Handling data
βœ… Describing relationships
βœ… Randomness and noise
βœ… Casual modeling
βœ… Hypothetical thinking

The book code examples are with R.

The book currently has only an online version πŸ‘‡πŸΌ
https://dtkaplan.github.io/Lessons-in-statistical-thinking/

Thanks to the author for making the book open and free! πŸ™πŸΌ

ramikrispin, to python
@ramikrispin@mstdn.social avatar

Going Further with CUDA for Python Programmers πŸš€

The second part of Jeremy Howard's lecture on for programmers is now available πŸ‘‡πŸΌ

πŸ“½οΈ: https://www.youtube.com/watch?v=eUuGdh3nBGo

This lecture focuses on the following topics:
βœ… Optimized Matrix Multiplication
βœ… Shared Memory Techniques for CUDA
βœ… Implementing Shared Memory Optimization
βœ… Translating Python to CUDA and Performance Considerations
βœ… Numba: Bringing Python and CUDA Together

Notebook: https://github.com/cuda-mode/lectures/blob/main/lecture5/matmul_l5.ipynb

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

In the past few months, I created a bunch of Docker 🐳 tutorials covering random topics, from a fun setting for a Python 🐍 environment on the CLI to advanced topics such as multi-stage builds πŸ—οΈ. I organized all the tutorials under one folder, and I plan to keep updating this folder with future-related ones 😎.

Currently on my Docker tutorial TODO list:
➑️ Docker ENTRYPOINT vs CMD
➑️ Docker multi-architecture build

πŸ”— https://medium.com/@rami.krispin/list/docker-21408ce79e6a

Enjoy!

#docker #DataScience #vscode #mlops

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

DuckDB can now read data from Hugging Face via the hf:// prefix πŸ‘‡πŸΌ

https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb

#data #duckdb #DataScience #huggingface

ramikrispin, to rust
@ramikrispin@mstdn.social avatar

Rust for Beginners - Crash Course πŸš€

If you are looking to get started with Rust, you should check this crash course for beginners by Microsoft Developer. The course focuses on the foundation of Rust, from setting up Rust on your local computer to core functionality of the language, such as if/else statements, for loops, functions, and other core operators.

Course πŸ“½οΈ: https://www.youtube.com/playlist?list=PLlrxD0HtieHjbTjrchBwOVks_sr8EVW1x

#rust

ramikrispin, to python
@ramikrispin@mstdn.social avatar

PyData Global 2023 talks are now available πŸπŸ‘‡πŸΌ

The playlist contains 65 talks from the last PyData Global conference, covering a variety of data science and data engineering topics such as machine learning, time series, bayesian statistics, data pipelines, etc.

Playlist πŸ“½οΈ: https://www.youtube.com/playlist?list=PLGVZCDnMOq0rCyO6B53u1eFT4owN8Lvwj

#python #datascience #dataengineering #machinelearning

ramikrispin, to Bash
@ramikrispin@mstdn.social avatar

A Bash Scripting Course πŸš€

Bash is a useful language for automating processes on the command line and has a lot of applications from IT to MLOps. The Bash Scripting on Linux course by Jay LaCroix is an intro course for Bash. The course focuses on the foundation of Bash scripting, and it covers the following topics:
βœ… Working with variables
βœ… If-Else statements
βœ… Loops
βœ… Functions
βœ… Arguments
βœ… Scheduling

Course πŸ“½οΈ: https://www.youtube.com/playlist?list=PLT98CRl2KxKGj-VKtApD8-zCqSaN2mD4w

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

Complex Analysis - An Interactive Book πŸš€πŸ‘‡πŸΌ

The Complex Analysis book by Juan Carlos Ponce Campuzano focuses on the theory and applications of complex functions. The book makes great use of interactive data visualizations to explain the complex analysis theory.

https://complex-analysis.com/

#data #DataScience #math #datavisualization #infographic

ramikrispin, to OpenAI
@ramikrispin@mstdn.social avatar

(1/3) OK, Sora is cool, but what are the long-term impacts and applications?

Like ChatGPT, the OpenAI's Sora that was released yesterday did not introduce anything new; it just took it to a new level.

#openai #sora #llm #genai #deeplearning #tech

video/mp4

ramikrispin,
@ramikrispin@mstdn.social avatar

(3/3) AI won't replace people's jobs. Rather, people using AI will replace them.

#llm #genai #tech #DataScience

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

(1/3) Learn R Through Examples πŸš€πŸ‘‡πŸΌ

The Learn R Through Examples by Xijin Ge, Jianli Qi, and Rong Fan provides an introduction to data analysis with R. The book covers the core topics of data analysis using different datasets, from simple and clean datasets to messy and big datasets. πŸ§΅πŸ‘‡πŸΌ

#RStats #DataScience #datavisualization #data

image/png

ramikrispin,
@ramikrispin@mstdn.social avatar

(2/3) This includes the following topics:
βœ… Working with data frames
βœ… Data visualization with base R and ggplot2
βœ… Data structures
βœ… Summary statistics and correlation analysis
βœ… Case studies - analyzing multiple datasets

ramikrispin, to python
@ramikrispin@mstdn.social avatar

(1/2) Models Demystified - A Practical Guide from t-tests to Deep Learning πŸš€πŸ‘‡πŸΌ

The Models Demystified is a new book by Michael Clark and Seth Berry that focuses on the mechanizing of core data science algorithms. That includes the following topics:
βœ… Linear and logistic regression
βœ… Generalized Linear Models
βœ… Regularization methods
βœ… Model training approaches
βœ… Deep learning and neural networks
βœ… Causal Modeling

#RStats #python #DataScience #MachineLearning #deeplearning

image/png
image/png

ramikrispin,
@ramikrispin@mstdn.social avatar

(2/2) The code examples are with both R and Python 🐍.

Book πŸ“š: https://m-clark.github.io/book-of-models/

Thanks to the authors for making this book available for free online! πŸ™πŸΌ

Image credit: from the book

image/png

  • All
  • Subscribed
  • Moderated
  • Favorites
  • β€’
  • megavids
  • mdbf
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • JUstTest
  • tacticalgear
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • lostlight
  • All magazines