pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

nh, 14 days ago to random

I wonder if compilers could meaningfully benefit from smart cache blocking.

LLVM has function pass managers. The idea being that we keep a single function in cache while running many passes on it, before then doing the same on the next function etc.

This makes sense because compilers tend to be pointer chase nightmares. You want that sweet L1 cache hit latency.

But compilers also tend to be branch nightmares. What if you have many, many tiny functions?

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 14 days ago

@nh I haven't thought about it much for compiler function passes (I tend to fuse passes aggressively anyway) but I assume the general approach is standard. E.g. a linker performs relocations. Suppose there are two reloc types that occur with 50% probability. Of the two monolithic approaches, one is mispredict heavy and the other is bandwidth heavy. If you apply blocking you get the best of both worlds, mostly. That pattern of blocked demultiplexing is widely applicable.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 14 days ago (edited 14 days ago)

@nh Another example like this is where you receive messages (commands/requests) of different types on a single channel and instead of processing them immediately you batch them (with a combination of a batch size and time cut-off to bound the latency) in separate buckets by message type. Of course this assumes (as does the relocation example) that you can process messages of different types out of order (although you can flush the buckets early if you only need ordering for specific messages).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Doomed_Daniel, 15 days ago to random

I've got a (probably simple) graphics programming-related question:
Is it correct that the only useful values for GL_TEXTURE_MAX_ANISOTROPY_EXT are 2, 4, 8 and 16 (and maybe 1 for "don't use anisotropic filtering")?

At least as far as I can remember I've never seen other values configurable in games; however, for some reason, GL_TEXTURE_MAX_ANISOTROPY_EXT is used with floats (glTexParameterf()), and the spec only says "float greater or equal to 1.0"

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 15 days ago

@Doomed_Daniel I don't think hardware supports non-power-of-two values but any positive integer n "makes sense" logically, right? You're taking n regularly spaced samples along the line of anisotropy.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 15 days ago

@Doomed_Daniel Only one way to find out. :) I don't actually know.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 15 days ago

@Doomed_Daniel In D3D it's an integer between 0 and 16.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dotstdy, 15 days ago to random

if only phoronix was so conscientious
https://fosstodon.org/@LWN/112525319235124309

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 15 days ago

@dotstdy LWN is one of the only places that feels like the old Internet (in the best sense).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

adrian, 16 days ago to random

I know how dumb this sounds, but I think it’s overwhelming how many more 64-bit numbers there are than 32-bit numbers. it’s, like, a lot.

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bentomn

pervognsen, 16 days ago (edited 16 days ago)

@adrian I used to think that about 64-bit addresses until I started reserving 4 GiB virtual address ranges for 32-bit sandbox address spaces within a single host process's address space. On machines where only 2^48 of the address space is usable, that's at most 2^16 sandbox instances (and quite a bit less in practice).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago (edited 16 days ago)

@adrian I find it helpful to measure in TiB since we all have a good intuition for what 1 TiB of disk storage means: 2^48 bytes is only 256 TiB. You can buy HDD storage for less than $10/TiB, so 256 TiB is a few thousand bucks, easily in the price range for a home NAS. While 2^64 bytes is a lot more than that, it definitely puts into perspective how relatively little 2^48 is when you compare it to a small server's disk storage (e.g. suppose you wanted to mmap it all into one process).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@adrian As the inverse of this problem, a lot of pointer tagging schemes that assumed they had 16 bits to themselves on 64-bit machines are in trouble now that normal machines (as of Ice Lake) have five-level paging.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago to random

If you follow the general recipe for F-algebras, the signature for foldr in a strict/non-lazy language like ML would not be ('a * 'b -> 'b) -> 'b -> 'a list -> 'b but ('a * 'b -> 'b) -> (unit -> 'b) -> 'a list -> 'b. It's probably a moot point in this case since the recursion "never does any useful work" until you hit the end of the list and the unit -> 'b thunk is called.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago (edited 16 days ago)

It's easiest to see how the difference between thunked and unthunked base values would matter more if you altered the list data type by having two kinds of empty lists so there were two ways to terminate a list.

I suppose the difference for the usual list data type could also be internally observed in an implementation like SML/NJ with callcc.

Anyway, it's a tiny detail I just hadn't noticed before about strict foldr in comparison to the general recipe for F-algebras.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dotstdy, 16 days ago to random

There's a bit of stuff in this article which phrases it in terms of changes over time, e.g. compute capability has grown and we no longer need big data. But it seems closer to reality that it was never required, and continues to not be required. (looking forward to the same style of post happening in a few years vis-a-vis microservices)

https://mastodon.social/@ltratt/112518285832831004

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@ltratt @pkhuong @dotstdy This remains one of my favs: https://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago (edited 16 days ago) to random

When a beverage proclaims "naturally & artificially flavored" on its label in bold letters you know they're not hiding anything. Or maybe they're trying to draw attention away from the supernatural flavoring.

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@rygorous I was trying to find a photo of blue soda to post to your thread.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@rygorous When it comes to glow-in-the-dark blue soda, accept no substitutes.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@rygorous @koos303 VP of ideas: energy drink that violates conservation of energy.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@rygorous @koos303 Time to add a section here for beverage applications: https://en.wikipedia.org/wiki/Zero-point_energy#Purported_applications

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dotstdy, 16 days ago to random

subgroup ops are a mindfuck on top of the existing mindfuck of work groups, local work groups, dispatches and invocations.

reply

expand (16)

collapse (16)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

pervognsen, 16 days ago

@aras @dotstdy Wasted opportunity for Vulkan to coin the term subsubgroup. Or maybe local subgroup. I'd like to see proposals for the worst plausible naming option.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 17 days ago to random

There's a bunch of C-like successor languages that say they want to eliminate undefined behavior. I've never been able to figure out how they intend to deal with reads and writes to memory since a lot of these languages take what I would call the "naive" machine-centric view of memory which is hard to reconcile with source-level semantics for variables, etc. You can't really rename all of this stuff as "implementation-defined" and get out of jail for free.

reply

expand (23)

collapse (23)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 17 days ago (edited 17 days ago)

@harold And if you restrict the claim to "no C-style UB" I'd still want to know what that still leaves. Otherwise we're just back to saying the semantics is whatever your compiler happens to do right now. I don't need a full semantics but at least a list of stuff that is definitely UB. And I'd want to know what (as large as possible) subset of the language is definitely UB free. I'm okay with leaving a fuzzy region between these two sets (there's a similar situation with unsafe Rust right now).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 17 days ago

@amonakov @harold Hah, I knew what Harold meant! Well, at least what I meant is that that there's a large range between UB minimalism and maximalism when it comes to specification freedom for a language roughly at C's level of abstraction, and C is not very UB minimalist (and hence there's room for C-like languages to be more UB minimalist).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago (edited 16 days ago)

@foonathan Yeah, the situation seems hopeless, doubly so with data races. Even if you could resolve the sequential case (and it would have to be so pessimizing I doubt anyone who wants to use a C-like language would be satisfied with the outcome) it still wouldn't help you with data races. I'm not even sure how you'd begin with that one. Even if you gave a sequentially consistent semantics to all reads and writes, you wouldn't be able to do any local alias analysis.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago (edited 16 days ago)

@foonathan In the sequential case you can at least perform local alias analysis which is sound if pessimistic (any store whose address cannot be statically alias analyzed is an invalidation barrier) but that isn't available in the concurrent case. I guess you'd have to resort to global alias analysis? That's certainly something you see in high-powered static analysis tools; it can get you further than local analysis, but the programmer-facing runtime performance model seems untenable.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago (edited 16 days ago)

@foonathan (And this is playing along by assuming that there actually is such a thing as a conservative/pessimistic approach to even the sequential case that can eliminate all of what we normally think of memory/variable/type-related sources of UB.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@zwarich @foonathan The scare quotes around "UB" are just a perfect example of why I don't want to waste time dealing with people like this.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

castano, 19 days ago to random

Is there a way not to burn quithub's LFS quota on github actions?
I have a repo with GBs of LFS data, and it looks like the checkout action downloads the entire thing every time it's triggered.
This is not only is extremely wasteful, but also gets expensive very quickly!

reply

expand (41)

collapse (41)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 16 days ago

@wolfpld We got it resolved. :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...