#OpenCL - kbin.social

ProjectPhysX, 6 days ago to GraphicsProgramming

#FluidX3D #CFD v2.17 is out! Some huge #GPU/#CPU hardware has been announced at #Computex, so I've made my code ready. Until now I've been using 32-bit indexing, which overflows for >2³² grid cells in a domain, equivalent to 225 GB VRAM. Now my #OpenCL code will at runtime automatically compile with 64-bit indexing when more cells are used. 🖖🧐
Also, I've added a new raytracing-based field visualization. Thank you @python for the idea! 💡
Release notes 👉 https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.17

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

karolherbst, 30 days ago to random

@bashbaug Used the intercept layers again today and I was wondering if injecting captured buffers/images is something which is either supported (and I haven't found how to do it yet) or something planned.

Like when I'm comparing between vendors with rusticl, it would be helpful if I could just replace image/buffer outputs with the content from a different capturing to quickly verify if the first difference is actually causing the bug I'm seeing or if it's something else.

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

bashbaug, 8 days ago

If anyone else encounters a similar problem in the future, buffer (and image!) injection is now implemented in the OpenCL Intercept Layer 🎉.

You can take a buffer or image from one device or driver, inject it as a kernel input for a different device or driver, and see how it affects the results.

https://github.com/intel/opencl-intercept-layer/blob/main/docs/injecting_buffers_images.md

#OpenCL

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

toxi, 6 months ago to opensource

A week ago was the 1st anniversary of this solo instance & more generally of my fulltime move to Mastodon. A good time for a more detailed intro, partially intended as CV thread (pinned to my profile) which I will add to over time (also to compensate the ongoing lack of a proper website)... Always open to consulting offers, commissions and/or suitable remote positions...

Hi, I'm Karsten 👋 — indy software engineer, researcher, #OpenSource author of hundreds of projects (since ~1999), computational/generative artist/designer, landscape photographer, lecturer, outdoor enthusiast, on the ND spectrum. Main interest in transdisplinary research, tool making, exploring techniques, projects & roles amplifying the creative, educational, expressive and inspirational potential of (personal) computation, code as material, combining this with generative techniques of all forms (quite different to what is now called and implied by "generative AI").

Much of my own practice & philosophy is about #BottomUpDesign, interconnectedness, simplicity and composability as key enablers of emergent effects (also in terms of workflow & tool/system design). Been adopting a round-robin approach to cross-pollinate my work & learning, spending periods going deep into various fields to build up and combine experience in (A-Z order): API design, audio/DSP, baremetal (mainly STM32), computer vision/image processing, compiler/DSL/VM impl, databases/linked data/query engines, data structures impl, dataviz, fabrication (3DP, CNC, knit, lasercut), file formats & protocols (as connective tissue), "fullstack" webdev (front/back/AWS), generative & evolutionary algorithms/art/design/aesthetics/music, geometry/graphics, parsers, renderers, simulation (agents/CFD/particles/physics), shaders, typography, UI/UX/IxD...

Since 2018 my main endeavor has been https://thi.ng/umbrella, a "jurassic" (as it's been called) monorepo of ~185 code libraries, addressing many of the above topics (plus ~150 examples to illustrate usage). More generally, for the past decade my OSS work has been focused on #TypeScript, #C, #Zig, #WebAssembly, #Clojure, #ClojureScript, #GLSL, #OpenCL, #Forth, #Houdini/#VEX. Earlier on, mainly Java (~15 years, since 1996).

Formative years in the deep end of the #Atari 8bit demoscene (Chip Special Software) & game dev (eg. The Brundles, 1993), B&W dark room lab (since age 10), music production/studio (from 1993-2003), studied media informatics, moved to London initially as web dev, game dev (Shockwave 3D, ActionScript), interaction designer, information architect. Branched out, more varied clients/roles/community for my growing collection of computational design tools, which I've been continously expanding/updating for the past 20+ years, and which have been the backbone of 99% of my work since ~2006 (and which helped countless artists/designers/students/studios/startups). Creator of thi.ng (since 2011), toxiclibs (2006-2013), both large-scale, multi-faceted library collections. Early contributor to Processing (2003-2005, pieces of core graphics API).

Worked on dozens of interactive installations/exhibitions, public spaces & mediafacades (own projects and many collabs, several award winning), large-scale print on-demand projects (>250k unique outputs), was instrumental in creating some of the first generative brand identity systems (incl. cloud infrastructure & asset management pipelines), collaborated with architects, artists, agencies, hardware engineers, had my work shown at major galleries/museums worldwide, taught 60+ workshops at universities, institutions and companies (mainly in EMEA). Was algorithm design lead at Nike's research group for 5 years, working on novel internal design tools, workflows, methods of make, product design (footwear & apparel) and team training. After 23 years in London, my family decided on a lifestyle change and so currently based in the beautiful Allgäu region in Southern Germany.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ villares, hamoid, guidoschmidt, robertoranon +1 more

athas, 18 days ago to random

Is there a way to query the GPU L2 cache size (if any) in #OpenCL? Both HIP and CUDA provide this, so the hardware/driver facility exists.

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

athas, 19 days ago to random

#OpenCL has a compiler flag -cl-fp32-correctly-rounded-divide-sqrt. If you don't pass this, then divisions and square roots are incorrectly rounded. Shouldn't this be the other way around? How many other flags to I need to pass in order for arithmetic to be correct?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

karolherbst, 30 days ago to random

Also.. I should finish #OpenCL support for the raspberry pi 4 and 5 GPUs 🙃

it's almost complete, just needs some last issues figured out (and me rebasing it as I've landed some needed bits) https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25362

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

ProjectPhysX, 1 month ago to random

I found an interesting optimization for the marching-cubes algorithm today: Since vertex interpolation happens on axis-aligned edges of the unit cube, it's sufficient to interpolate in 1D instead of 3D. The faster interpolation makes the conditions for which edge to interpolate unnecessary, allowing to get rid of the edge table. That brings the implementation down to 73 lines, including the triangle table. 🖖🤠
https://github.com/ProjectPhysX/FluidX3D/commit/649fd40fa6270fbd0823a53b2a55f4194fc9510b#diff-464b1d19d4b616b9609031b48429081b2c215328d9f98bc5cbeac6b2b84fdbf3R456

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ demofox

ProjectPhysX, 1 month ago

@nickserv that's a bug in #ARM's #OpenCL runtime: fused-multiply-add (fma) is somehow emulated with terrible performance. This is very similar to what @niconiconi found on Nvidia CMP 170HX, where fma was disabled in the driver.
I've just fixed this in #FluidX3D, by macro-replacing fma with a*b+c. Performance went up by 8-13x on my Samsung S9+ (ARM Mali-G72 MP18) with this workaround.
https://github.com/ProjectPhysX/FluidX3D/commit/9ce2caecfc85e4fda50fed3350304b75b223b06b
cc @chipsandcheese

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ niconiconi

bashbaug, 2 months ago to random

We released an updated version of the OpenCL Intercept Layer yesterday, just in time for #IWOCL!

This release supports the latest OpenCL extensions, includes a bunch of performance improvements, and adds a bunch of new features, including the ability to capture an OpenCL kernel and replay it outside of an application for easier debugging.

Get the latest version here: https://github.com/intel/opencl-intercept-layer

#OpenCL

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto, oblomov

bashbaug, 2 months ago to random

I couldn't make it to Chicago and IWOCL this year, so I did my best to bring Chicago to me. #IWOCL #OpenCL #SYCL

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

karolherbst, 2 months ago to linux

So, I'll be having my talk about Rusticl, Compute in the linux desktop and other related topics at IWOCL next week.

Any specifics topics you want me to cover?

#iwocl #opencl #rusticl #linux

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, kernellogger

jay, 2 months ago to ai

🚩 Entire Team Lived in Bubble for Years - Proclaim Never Heard of OpenCL

https://www.theregister.com/2024/03/26/uxl_foundation_cuda_alternative

#UXLFoundation #UXL #CUDA #OpenCL #Khronos #AI #GPU #Heterogenous #Acceleration #oneAPI

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ProjectPhysX, 2 months ago to GraphicsProgramming

How realistic can a #CFD simulation be? Here is a 1 billion cell #FluidX3D simulation of an impacting raindrop, fully raytraced in 8K. FluidX3D contains state-of-the-art volume-of-fluid and surface tension models for highly accurate free surface simulations. Combined with my own #OpenCL #raytracing engine, results are rendered on-the-fly at resolution as large as remaining #GPU VRAM can hold. 🖖😋💧📺
https://youtu.be/MmLNQIW_Sic
FluidX3D is on #GitHub: https://github.com/ProjectPhysX/FluidX3D

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, giuseppebilotta, wonka

sascha, 3 months ago to GraphicsProgramming German

My community driven @thekhronosgroup GPU hardware databases for #Vulkan, #OpenGL, #OpenGLES and #OpenCL recently hit the 50,000 reports milestone.

Did a small write up on this, including some history on those databases at https://www.saschawillems.de/blog/2024/03/09/gpu-hardware-databases-hit-50000-reports/

Thanks to everyone who contributed (and is contributing) reports 😊

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, dougbinks, dneto

ProjectPhysX, 4 months ago to Nvidia

Another day, another #Nvidia #GPU driver bug that needs a workaround: seems like Nvidia's #OpenCL driver suffers 32-bit uint overflow within the cl::CommandQueue::enqueueFillBuffer call! 🖖🤦‍♂️
https://github.com/ProjectPhysX/FluidX3D/commit/82976f15d2bd20b9188ea623cf0bac046c6c81ce

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, giuseppebilotta

ProjectPhysX, 3 months ago

Found and reported another bug in #Nvidia #GPU drivers: passing vector types like int3 as #OpenCL kernel parameters is broken. 🖖🙂

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ProjectPhysX, 11 months ago to github German

#FluidX3D has passed 2000 Stars! It is the most popular #CFD software on #GitHub now! 🖖😊⭐️
https://github.com/ProjectPhysX/FluidX3D
Feeling blessed that my work is useful to so many people across the globe, with users in 75 countries already! 🌍
42% EU, 30% Americas, 25% Asia, 3% Oceania+Africa

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ProjectPhysX, 3 months ago

The red lightning bolt continues: #FluidX3D has passed 3000 Stargazers on #GitHub - from 82 countries! 🖖🥳⭐
Releasing this software for free really has turned out win-win: I've received so much valuable feedback, and answered with as many bug fixes and updates, with many more to come. I am enabling cutting-edge #CFD simulations for everyone, with very little hardware resources, on literally every computer that has a #GPU, regardless of vendor.
👉 https://github.com/ProjectPhysX/FluidX3D
#SimulationFriday #OpenCL

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Methylzero, 3 months ago to hpc

#HPC #CUDA #OpenCL #LAPACK
If you had to do a lot of linear least square solves, with potentially rank-deficient matrices, what would you use on a GPU? On CPUs, LAPACK's DGELSY does work, but most GPU libraries seem to not implement routines for rank-deficient matrices.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

ProjectPhysX, 4 months ago to GraphicsProgramming

#FluidX3D v2.13 is out, providing faster #VTK export with automatic SI unit conversion and a variety of bug fixes!
Full release notes: https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.13
#GPU #CFD #OpenCL #GPGPU #HPC #GitHub

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

ProjectPhysX, 4 months ago to intel

This is wild: #FluidX3D can "SLI" together 🔵 #Intel Arc A770 + 🟢 #Nvidia Titan Xp, pooling 12GB+12GB of their VRAM for one large 450M cell #CFD simulation. Top half on A770, bottom half on Titan Xp. They seamlessly communicate over PCIe. Performance is ~1.7x of what either #GPU could do on its own. 🖖😋🖥🔥
#OpenCL shows its true power here - one implementation works on literally all GPUs at full performance, even at the same time. Happy #SimulationFriday!
https://youtu.be/PscbxGVs52o

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto

karolherbst, 4 months ago to random

Anyway, any OpenCL applications you want to see working on Rusticl and which aren't atm? Or in general? It's slowly getting into the state, where things "just work".

#rusticl #opencl

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mwfc, dneto, oblomov

jannem, 4 months ago to GraphicsProgramming

@VileLasagna Has a blog post on the relative speed of different #GPU compute frameworks on the same hardware and driver.

Tl;dr: on an #Nvidia card, with Nvidia drivers, #CUDA is the slowest, by far. Fastest is our old stalwart #OpenCL - almost twice as fast when used only for compute. #Vulcan is good, and the least affected by using the card for your desktop at the same time. Read it - it's good.

#HPC #gpgpu #compute

https://vilelasagna.ddns.net/coding/if-you-want-performance-maybe-you-should-drop-cuda/

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

gabmus, 6 months ago to linux

@oblomov I've been asked today, is #ROCm (on #linux) any good these days? I haven't really been into the #gpgpu space for a while.

Also #askfedi

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

mwibral, 5 months ago

@gabmus
I use #rocm 5.7 to run #opencl, google's #jax (for pymc), and #pytorch on two vega cards (Vega 64 and Radeon pro WX9100) on arch and ubuntu. They all run Ok, but correct setup needs some googling around, and jax beeds exporting some #xla flags. Situation is much, much better than 2 years ago, though.
@oblomov

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

toxi, 5 months ago to genart

Passively participating in #Genuary2024 — Day 8 Chaotic System. In 2012/13 I designed an award-winning audioreactive brand identity system for Leeds College Of Music based on the DeJong strange attractor with tens and hundreds of millions of particles per frame. This massive almost 1 year project consisted of a Mac/PC desktop app (written in Clojure, OpenCL & OpenGL) for exploring the attractor, creating presets and scheduling render jobs for super hi-res print assets (which would take a hours to render and were the biggest image sizes I ever had to deal with, up to 3x3 meters @ 150 dpi). I also had to develop an entire AWS based ad-hoc render farm and asset & user management system for the school to generate personalized video assets, allowing each student to upload their own music, handle audio FFT analysis and beat detection/mapping (all in Clojure) and to create individual sound-responsive clips for their in-school digital signage system and for sharing on social media... Most key aspects were handled via various old thi.ng libraries (e.g. https://thi.ng/simplecl for OpenCL interop). The server app also handled transcoding to dozens of video formats (via ffmpeg) and semi-automatic provisioning of EC2 machines for render/transcoding jobs...

An example video is below (music: Heyoka, Blue Towel)

#GenerativeArt #Vintage #StrangeAttractor #Particles #AudioResponsive #AudioVisualization #Clojure #OpenCL #OpenGL #Branding #BrandIdentity #Renderer #LCOM

1 minute long audio-responsive generative animation of two strange attractor particle systems mapped onto a sphere with various special effects like additive color blending, layering. At the end of the video the sphere zooms out and blends into a static "Leeds College of Music" end screen

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

toxi, 5 months ago

Some more screenshots of the LCOM desktop app in action...

#Genuary2024 #GenerativeArt #Vintage #StrangeAttractor #Particles #AudioResponsive #AudioVisualization #Clojure #OpenCL #OpenGL #Branding #BrandIdentity #Renderer #LCOM

Screenshot of the identity generator app showing the strange attractor particle system (here visualized via ~19 million purple & orange blended particles). Various UI controls for design parameters, colors and render configurations are on either side of the screen
Screenshot of the identity generator app showing the strange attractor particle system (here visualized via ~9 million blue & yellowish superimposed particles). Various UI controls for design parameters, colors and render configurations are on either side of the screen
Fullscreen still of a frame from the audio-responsive particle system on black background. The sphere the particles are usually projected on is completely deformed in to wispy spikes/waveforms by the audio signal. The blue & orange particles are rendered using additive blending, creating an effect of light emanating....

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pekka, 5 months ago to random

PoCL v5.0 is now released! Download: https://github.com/pocl/pocl/releases/tag/v5.0 Release notes: http://portablecl.org/docs/html/notes_5_0.html #opencl

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto, giuseppebilotta