giuseppebilotta

@giuseppebilotta@fediscience.org

Researcher @ INGV-OE. Opinions my own.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

ProjectPhysX, 1 month ago to GraphicsProgramming

One of my #PhD papers got selected for the 2022 Best Paper Award of MDPI Computation! 🖖🥳📃🏆

That was a very bold publication for multiple reasons:

I solo-authored it

I wrote that paper in only 2 weeks

the title contains "Esoteric" twice

I submitted it on April 1st

It's serious science though: I discovered a simple algorithm to cut memory demand of the #LBM in half, allowing huge simulations on cheap #GPUs. This is one of the key innovations in #FluidX3D #CFD.

https://doi.org/10.3390/computation10060092

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 1 month ago

@ProjectPhysX (I'm sorry but I'm laughing at the reference lists wrapping because they didn't get compacted: <https://www.mdpi.com/2079-3197/10/6/92>)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 1 month ago to GraphicsProgramming

Well, this is interesting.

Someone has posted an announcement for open #postDoc and #ResearchAssociate positions on the #GPUSPH forum
https://gpusph.discourse.group/t/postdoc-ra-positions-at-oregon-state-university/207
Although they are not specifically about GPUSPH the software, they are about #GPU, #SPH and related topics (including wave modeling for #oceanEngineering #coastalEngineering), so I think I'll leave the announcement up.

#getFediHired #jobOffer

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

giuseppebilotta, 1 month ago to random

So for a month now we've had a new your researcher working with us on #GPUSPH simulations. After the first 10 days or so of onboarding she has started with her first hands-on experience writing a test case. This is always a very educational thing —for me. It really drives in how inadequate our documentation is 8-P

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 1 month ago to random

Student sends me his project for marking. It doesn't compile.

sigh

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

jay, 1 month ago to ai

🚩 Entire Team Lived in Bubble for Years - Proclaim Never Heard of OpenCL

https://www.theregister.com/2024/03/26/uxl_foundation_cuda_alternative

#UXLFoundation #UXL #CUDA #OpenCL #Khronos #AI #GPU #Heterogenous #Acceleration #oneAPI

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 1 month ago

@jay I suppose that tech journalists doing their job and asking what is wrong with the existing standards (including OpenCL and SYCL) would have been too much.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 1 month ago

@infektor @jay
I appreciate the spirit, but I can't say I see it reflected in the results I've seen so far.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago to random

Related to my previous toot, it's amazing how impactful dimensionality is in determining workloads. This is something that in the abstract we all know, since 1D problems scale with n, 2D with n^2 and 3D with n^3. But at least for me it's still kind of amazing how impactful the difference is.

1/n

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

In our fire propagation model, that 100x speed-up came from effectively switching from a 2D to a 1D solution. We have a cellular automaton where cells are burnable, burning or burnt. At every iteration, a burnable cell adjacent to a burning cell can catch fire (probabilistically). We were running the “ignite_me” check on all burnable cells, and for each of them check if there was a burning cell and then check if this resulted in propagation.

2/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

This is a classic 2D approach that scales with n², where n is the number of cells in each dimension (in our case the automaton grid is actually square so this is literally n², with n =~ 500). The number of cells to check decreases over time proportionally to i², with i the iteration number (imagine a circular area growing by 1 pixel radius per iteration, the number of pixels grows as 3i² approximately https://en.wikipedia.org/wiki/Gauss_circle_problem, so you get about n² - 3i² cells to check at each iteration).

3/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

However, you don't need to check all unburnable cells, and then each of its neighbors: most of those checks will not find anything. How about this then: whenever a cell becomes burning, it marks all its adjacent burnable cells, and then we ONLY check the marked cells. Now we're not checking n² cells at each iteration, but only the thin layer of cells around the burning front. This is a 1D approach! And unless your fire has a fractal front, the number of cells grows approximately like 6i

4/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

Again assuming all 500² = 250'000 cells are burnable in the beginning, with the front approach at worst you check something like 6*500 = 3'000 cells, which is a reduction of more than 80 times in the workload —that's where that impressive speed-up comes from!

5/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

giuseppebilotta, 2 months ago

Again, this isn't “new”, but I still find myself surprised whenever it happens. Fun fact: #GPUSPH has been developed for the best part of the last 15 years to be 3D-only. Among other things, this meant that one had to be careful when pushing resolutions too much, since halving the inter-particle spacing meant approximately 8 times more particle: it's easy to get hundreds of millions of particle that way! And of course the timestep goes down by a factor of two as well, so …

6/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

… the total computational workload (iterations * particles) grew by a factor of 16 for every doubling of the resolution —and that was only for inviscid flows.

When I first introduced 2D support in #GPUSPH and finally got it to a point where I could actually run a simulation, I couldn't believe my eyes: «it can't be done already». And finding I could push it easily to resolutions of 1/128 or even 1/256 where 1/64 was already taking chances …

7/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

OTOH, 3D is very good to stress-test your HPC prowess, specifically because of how easy it is to produce massive simulations. And with the appropriate geometries (e.g. with “thick 2D”, for test cases that are planar but you need to add a third dimension because your code is 3D ahem) it's also easy to increase the workload linearly (increase the thickness!) which is excellent for weak scaling tests!

8/8

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago to random

The fastest code is the one that does nothing

I know this sounds like a silly joke, but this is something that anyone writing code should keep in mind:

Often, the best way to make your code faster is to just skip work. If necessary, do a little bit of work to skip a LOT of work.

I got reminded of this recently: we've been working on optimizing our fire propagation PCA, and …

To give you an idea about the scale of the optimization, on my laptop we went from 6 to 560 steps per second.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

giuseppebilotta, 2 months ago (edited 2 months ago) to random

OK I'm obviously doing something wrong so some #fediHelp in #probability would help here. Say I have an automaton whose cells can be in any of three states S1, S2, S3 with probability p1, p2, p3 (p1+p2+p3=1). The probability of a cell c changing from S1 to S2 depends on the neighbors being in state S2. c stays in state S1 if it's not “infected” by any of the neighbors. Say p12(c, n) is the probability of c moving from S1 to S2 if n is S2. What's the total probability of c staying in S1?

1/n

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

My initial reasoning has been this: (1 - p2(n)*p12(c, n)) is the probability of n not infecting c. The probability of not being infected is the product over all neighbors of that, times p1(c) (the probability of c being “infectable”). However, making the probability evolve this way gives a very different distribution than actually tracking the state of the cells over multiple runs and then counting how many times a cell gets infected. So what am I doing wrong?

2/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

Different approach I'm considering: the probability of c being infected by n is k12(c, n) = p1(c)*p2(n)p12(c, n). So the probability of staying S1 is the complement from all neighbors \prod_n(1 - k12(c, n)), but that can't be, as it can be higher than p1(c). Should it be p1(c)\prod_n(1 - k12(c, n))? But then am I not account for p1(c) too many times? I'm obviously missing something, and being out of my element don't even know where to look things up.

3/3

#askFedi #probability #fediHelp

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

OK I think I'm staring to see why the simulation tracking probabilities is different from the “run n times and compare results”: in the probability-tracking approach, we have no “memory” of the state: the probability of infection propagates “in both directions”, and thus propagates back to the cell that might have triggered the propagation.

Damn. Does this mean that the only way to do this is with the “run multiple times” approach?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

To clarify what I mean, imagine the case of a 1D automaton with cells C1, C2, C3. If I run the standard propagation model with C1 initially infected, what happens is that C1 may infect C2, and then when‌ C2 gets infected, it may infect C3. By running this 100 times, I can get an estimate of the probability at every iteration that C2 or C3 are infected.
If I try to propagate probability directly, what happens is that I have initial probability p2(C1) = 1, p2(C2) = 0, p2(C3) = 0.

1/n

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

On the next step, p2^1(C2) = 1 - p12(C2, C1). On the next step p2^2(C3) is computed from the p2^1(C2) … the problem is that THEN C2 has a new infected neighbor (C3) without knowledge that this is actually the “infection” coming from then, so it bounces back. And so on, until they are guaranteed to be infected. I would need a way to avoid this kind of feedback.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago

And the problem is that this would have to be done in a situation in which initially I don't have a 1 in one cell and 0 elsewhere, but with a situation where I start from nonzero probabilities everywhere. Ideally without tracking where these come from, but this is probably what I cannot avoid.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

giuseppebilotta, 2 months ago to random

Student trying to pass code taken from the Internet as his own 8-(

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

giuseppebilotta, 2 months ago (edited 2 months ago) to random

I've been working on thermal support in #GPUSPH, and was finally at the stage where I could run some tests and check the results. Except that #ParaView was refusing to open my files, complaining about “invalid token”s. I just spent over half an hour trying to understand what I changed in my code that had broken the output, even though I haven't changed anything related to it recently … turns out it wasn't my problem, but an issue with an upgrade ParaView and libexpat:
https://discourse.paraview.org/t/i-cannot-read-a-vtp-file-i-could-open-yesterday-can-someone-try-to-open-it/13938/12

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ProjectPhysX, 3 months ago to Nvidia

Another day, another #Nvidia #GPU driver bug that needs a workaround: seems like Nvidia's #OpenCL driver suffers 32-bit uint overflow within the cl::CommandQueue::enqueueFillBuffer call! 🖖🤦‍♂️
https://github.com/ProjectPhysX/FluidX3D/commit/82976f15d2bd20b9188ea623cf0bac046c6c81ce

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, giuseppebilotta

giuseppebilotta, 2 months ago

@ProjectPhysX are they still doing the “we forgot it should have padding in some places but not in others” thing?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...