In the past few months, I created a bunch of Docker ๐ณ tutorials covering random topics, from a fun setting for a Python ๐ environment on the CLI to advanced topics such as multi-stage builds ๐๏ธ. I organized all the tutorials under one folder, and I plan to keep updating this folder with future-related ones ๐.
Currently on my Docker tutorial TODO list:
โก๏ธ Docker ENTRYPOINT vs CMD
โก๏ธ Docker multi-architecture build
The size of the Docker image could quickly increase during the build time. I became more mindful of the image size when I started to deploy on Github Actions. The bigger the image size, the longer the run time and the higher the runtime cost.
This is when you should consider using a multi-stage build ๐.
(1/2) Setting A Dockerized ๐ณ Python ๐ Environment โ The Elegant Way
A few weeks ago, I created a short tutorial about setting up a dockerized ๐ณ Python ๐ environment via the CLI, or the hard way. The second tutorial on this topic provides a more elegant and robust approach for setting up a Python dockerized development environment with VScode and the Dev Containers extension ๐.
(1/2) I created the second tutorial on the series of running RStudio inside a container ๐. This tutorial focuses on formalizing the run command from the first tutorial with Docker Compose using the Rocker RStudio image ๐ณ ๐๐ผ
Setting and running RStudio inside a containerized environment is easier than it seems, thanks to the Rocker project.
(1/2) MLflow for Machine Learning Development ๐
The MLflow for Machine Learning Development course by Manuel Gil provides a great introduction to the MLflow Python library ๐. The course focuses on the MLflow core functionality and workflow and covers the following topics:
โ Setting MLflow
โ Creating and working with experiences
โ Logging metadata (parameters, score, etc.)
โ Model registry
โ Model tuning
โ MLflow project demo
(1/3) Here is one of the most frequent questions I get on most of my Python ๐+Docker ๐ณ tutorials - why use a virtual environment inside a container?
The short answer is that you don't necessarily need a virtual environment (VE) to set a reproducible environment inside a container. Docker takes care of both the environment isolation and reproducibility.
I see VE as more of a practical method to organize your Python environment inside a container.
Bash is a useful language for automating processes on the command line and has a lot of applications from IT to MLOps. The Bash Scripting on Linux course by Jay LaCroix is an intro course for Bash. The course focuses on the foundation of Bash scripting, and it covers the following topics:
โ Working with variables
โ If-Else statements
โ Loops
โ Functions
โ Arguments
โ Scheduling
(1/4) Setting A Dockerized Python Environment โ The Hard Way
I create a (relatively) short tutorial about setting up a dockerized ๐ณ Python ๐ environment on the command line (CLI). Generally, I don't advocate anyone to set their Python development workflow via the CLI. There are better tools to work with Python and Docker, such as VScode with the Dev Containers extension. ๐งต๐๐ผ
I think that most of the data scientists prefer to use some type of virtual environment (VE) in their applications. A short ๐งถ๐งต about the main differences between the two ๐๐ผ
Are you planning to learn a new data science or engineering skill as your New Yearโs resolution ? Here is a collection of random open and free courses and resources I came across during the past year covering various topics, including deep learning, NLP, Python, statistics, and more.
(1/2) Machine Learning Engineering Online Book ๐
I came across this amazing repo by ๐๐ญ๐๐ฌ ๐๐๐ค๐ฆ๐๐ง - the Machine Learning Engineering Online Book with a collection of guides for ML engineering focusing on training LLM and multi-model models.
License: Attribution-ShareAlike 4.0 International ๐ฆ
I am excited to present in November at the Oredev Developer Conference in Sweden about forecasting and MLOps ๐. In addition, I will run a workshop about forecasting methods with regression models โค๏ธ.
There are tons of MLOps platforms, middlewares and lambdas... are there some missing components? Seems that all is already in place๐ #mlops#datascience#kubernetes
Today I took a first look at LangSmith, a new platform for LLM production pipelines by LangChain.
I can't hook it up to a working pipeline yet because it's in closed beta, but it surely looks ambitious. It should make it easier to do logging, monitoring, debugging and evaluating pipelines (chains) against each other. It's tightly integrated with LangChain but it should support other frameworks/models as well.
My thinking here is that @huggingface is an acquisition target for #NVIDIA because they don't have an #MLOps platform offering - I also wonder where #GitLab sits in all this too ...
TIL https://www.jailbreakchat.com/ is a website that collects prompt injection attacks for LLMs, i.e. getting the language model to do stuff that is not allowed by inserting malicious prompts.
๐งโ๐ป New video! Walk through the "whole game" of #MLOps with #rstats:
๐ Data prep with #tidyverse
๐ง Model training & eval with #tidymodels
โ Deployment with #vetiver in #Docker ๐ณ on @huggingface ๐ค
๐ Monitoring with #pins