A week ago was the 1st anniversary of this solo instance & more generally of my fulltime move to Mastodon. A good time for a more detailed intro, partially intended as CV thread (pinned to my profile) which I will add to over time (also to compensate the ongoing lack of a proper website)... Always open to consulting offers, commissions and/or suitable remote positions...
Hi, I'm Karsten 👋 — indy software engineer, researcher, #OpenSource author of hundreds of projects (since ~1999), computational/generative artist/designer, landscape photographer, lecturer, outdoor enthusiast, on the ND spectrum. Main interest in transdisplinary research, tool making, exploring techniques, projects & roles amplifying the creative, educational, expressive and inspirational potential of (personal) computation, code as material, combining this with generative techniques of all forms (quite different to what is now called and implied by "generative AI").
Much of my own practice & philosophy is about #BottomUpDesign, interconnectedness, simplicity and composability as key enablers of emergent effects (also in terms of workflow & tool/system design). Been adopting a round-robin approach to cross-pollinate my work & learning, spending periods going deep into various fields to build up and combine experience in (A-Z order): API design, audio/DSP, baremetal (mainly STM32), computer vision/image processing, compiler/DSL/VM impl, databases/linked data/query engines, data structures impl, dataviz, fabrication (3DP, CNC, knit, lasercut), file formats & protocols (as connective tissue), "fullstack" webdev (front/back/AWS), generative & evolutionary algorithms/art/design/aesthetics/music, geometry/graphics, parsers, renderers, simulation (agents/CFD/particles/physics), shaders, typography, UI/UX/IxD...
Since 2018 my main endeavor has been https://thi.ng/umbrella, a "jurassic" (as it's been called) monorepo of ~185 code libraries, addressing many of the above topics (plus ~150 examples to illustrate usage). More generally, for the past decade my OSS work has been focused on #TypeScript, #C, #Zig, #WebAssembly, #Clojure, #ClojureScript, #GLSL, #OpenCL, #Forth, #Houdini/#VEX. Earlier on, mainly Java (~15 years, since 1996).
Formative years in the deep end of the #Atari 8bit demoscene (Chip Special Software) & game dev (eg. The Brundles, 1993), B&W dark room lab (since age 10), music production/studio (from 1993-2003), studied media informatics, moved to London initially as web dev, game dev (Shockwave 3D, ActionScript), interaction designer, information architect. Branched out, more varied clients/roles/community for my growing collection of computational design tools, which I've been continously expanding/updating for the past 20+ years, and which have been the backbone of 99% of my work since ~2006 (and which helped countless artists/designers/students/studios/startups). Creator of thi.ng (since 2011), toxiclibs (2006-2013), both large-scale, multi-faceted library collections. Early contributor to Processing (2003-2005, pieces of core graphics API).
Worked on dozens of interactive installations/exhibitions, public spaces & mediafacades (own projects and many collabs, several award winning), large-scale print on-demand projects (>250k unique outputs), was instrumental in creating some of the first generative brand identity systems (incl. cloud infrastructure & asset management pipelines), collaborated with architects, artists, agencies, hardware engineers, had my work shown at major galleries/museums worldwide, taught 60+ workshops at universities, institutions and companies (mainly in EMEA). Was algorithm design lead at Nike's research group for 5 years, working on novel internal design tools, workflows, methods of make, product design (footwear & apparel) and team training. After 23 years in London, my family decided on a lifestyle change and so currently based in the beautiful Allgäu region in Southern Germany.
OK so I'm ready for today's #GPGPU lesson with the new laptop. My only gripe for the lesson will be that #Rusticl in #Mesa 23.2 doesn't support #profiling information. Apparently the feature was merged at a later commit https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24101
and I even tried upgrading to my distro's experimental 23.3-rc1 packages, but trying to use rusticl on those packages segfaults. So either I've messed up something with this mixed upgrade, or I've hit an actual bug.
I'm still moderately annoyed by the fact that there's no single #OpenCL platform to drive all computer devices on this machine. #PoCL comes close because it supports both the CPU and the #NVIDIA dGPU through #CUDA, but the not the #AMD iGPU (there's an #HSA device, but). #Rusticl supports the iGP (radeonsi) and the CPU (llvmpipe), but not the dGPU (partly because I'm running that on proprietary drivers for CUDA). Everything else has at best one supported device out of three available.
Whenever I can, I try to end each lesson with a provocation.
When we finished our first trivial #OpenCL program, I showed them how the kernel runtime plus data transfer runtime actually made GPUs “not convenient”, as a prelude to illustrating the usefulness of memory pinning and buffer (un)mapping to improve data transfer efficiency and avoiding them when possible.
We're still working on that trivial program, so today I showed them how number of elements affects performance.
This year I managed to squeeze the introduction to #OpenCL at the end of the lesson before we started writing code, so we managed to make the first complete OpenCL example in one lesson (it usually takes us two lessons). The code compiled and ran correctly in the first go. Everybody was surprised (including me!) —students are third years, yet they are already familiar with the principle that if it seems to work, there's a subtle bug that will rear its head at the worst of times.
Got #TornadoVM installed and running on my local Linux laptop, a #Lenovo 14s Thinkpad with an 10th generation Intel® Core™ CPU and an integrated Intel® UHD graphics card.
Took a bit of futzing around with runtime dependencies, but the required packages (for Ubuntu Jammy) were:
As I've been updating the build files for my various #ziglang projects & templates, also learned that quite a few of them have to be overhauled/refactored due to syntax changes and a more strict compiler. One example is this #WASM#voxel#renderer from 1.5 years ago which doesn't build anymore without major code updates, but the old build still works:
Reload for random views. Press x to export current frame. The renderer is incremental (never finishes) and slowly reduces pixel size from 8 down to 1. It would be much faster, but I had some ideas for creating a more stylistic output and in this current state it only renders a fixed area per frame...
The 2-bit 512^3 voxel model was generated with a custom fork of @R4_Unit's voxel automata... 🥰
My ChEESE CoE webinar talk on #OpenCL#GPU programming for #HPC applications is now uploaded to my YouTube channel and provided with timestamps. Enjoy! 🖖😎🧀
👉 https://youtu.be/w4HEwdpdTns
「 Mesa’s own OpenCL implementation Rusticl is now officially supported for AMD Radeon graphics cards. A bunch of Asahi fixes are present as well in Mesa 23.1, which also brings various updates to the PanVK, LLVMpipe, RadeonSI, and Zink drivers 」
— @9to5linux
However one #opencl weirdness is that for Radeon there are two OpenCL sets; the one from #Mesa and the RocM set from AMD - and they're giving me wildly different behaviours on different data sets. For some data sets the mesa one is much faster, but for others the RocM set is much faster. That's going to be 'interesting'
My #OpenCL stuff is getting somewhere; 'profiling' is fairly nice; you set events and can read the times at the events - so at least I can tell which of the 3 kernels I'm running is slow. With some vectorisation I'm up from ~200lps in the slow case to about 1100lps - which means it's no longer painful.
my reason for looking in #mesa was that I was trying to use profiling in some #opencl code and while it worked with AMDs ROCm user code, it failed on Mesa with a 'PROFILING_INFO_NOT_AVAILABLE' and it looks like there are a whole bunch of reasons that can happen - in the end it turned out to be because the event was still in the queue, but I only figured that out by adding debug prints into mesa.
Hmm my #opencl code is running about 4x slower on #fedora 38 than f37 (on Radeon) - not figured out which component though; opencl is hard enough to profile at the best of times; and there's a lot of components. Ideas welcome.