@pervognsen@mastodon.social avatar

pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

nh, to random
@nh@mastodon.gamedev.place avatar

I wonder if compilers could meaningfully benefit from smart cache blocking.

LLVM has function pass managers. The idea being that we keep a single function in cache while running many passes on it, before then doing the same on the next function etc.

This makes sense because compilers tend to be pointer chase nightmares. You want that sweet L1 cache hit latency.

But compilers also tend to be branch nightmares. What if you have many, many tiny functions?

pervognsen,
@pervognsen@mastodon.social avatar

@nh I haven't thought about it much for compiler function passes (I tend to fuse passes aggressively anyway) but I assume the general approach is standard. E.g. a linker performs relocations. Suppose there are two reloc types that occur with 50% probability. Of the two monolithic approaches, one is mispredict heavy and the other is bandwidth heavy. If you apply blocking you get the best of both worlds, mostly. That pattern of blocked demultiplexing is widely applicable.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@nh Another example like this is where you receive messages (commands/requests) of different types on a single channel and instead of processing them immediately you batch them (with a combination of a batch size and time cut-off to bound the latency) in separate buckets by message type. Of course this assumes (as does the relocation example) that you can process messages of different types out of order (although you can flush the buckets early if you only need ordering for specific messages).

Doomed_Daniel, to random
@Doomed_Daniel@mastodon.gamedev.place avatar

I've got a (probably simple) graphics programming-related question:
Is it correct that the only useful values for GL_TEXTURE_MAX_ANISOTROPY_EXT are 2, 4, 8 and 16 (and maybe 1 for "don't use anisotropic filtering")?

At least as far as I can remember I've never seen other values configurable in games; however, for some reason, GL_TEXTURE_MAX_ANISOTROPY_EXT is used with floats (glTexParameterf()), and the spec only says "float greater or equal to 1.0"

pervognsen,
@pervognsen@mastodon.social avatar

@Doomed_Daniel I don't think hardware supports non-power-of-two values but any positive integer n "makes sense" logically, right? You're taking n regularly spaced samples along the line of anisotropy.

pervognsen,
@pervognsen@mastodon.social avatar

@Doomed_Daniel Only one way to find out. :) I don't actually know.

pervognsen,
@pervognsen@mastodon.social avatar

@Doomed_Daniel In D3D it's an integer between 0 and 16.

dotstdy, to random
@dotstdy@mastodon.social avatar

if only phoronix was so conscientious
https://fosstodon.org/@LWN/112525319235124309

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy LWN is one of the only places that feels like the old Internet (in the best sense).

adrian, to random
@adrian@discuss.systems avatar

I know how dumb this sounds, but I think it’s overwhelming how many more 64-bit numbers there are than 32-bit numbers. it’s, like, a lot.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@adrian I used to think that about 64-bit addresses until I started reserving 4 GiB virtual address ranges for 32-bit sandbox address spaces within a single host process's address space. On machines where only 2^48 of the address space is usable, that's at most 2^16 sandbox instances (and quite a bit less in practice).

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@adrian I find it helpful to measure in TiB since we all have a good intuition for what 1 TiB of disk storage means: 2^48 bytes is only 256 TiB. You can buy HDD storage for less than $10/TiB, so 256 TiB is a few thousand bucks, easily in the price range for a home NAS. While 2^64 bytes is a lot more than that, it definitely puts into perspective how relatively little 2^48 is when you compare it to a small server's disk storage (e.g. suppose you wanted to mmap it all into one process).

pervognsen,
@pervognsen@mastodon.social avatar

@adrian As the inverse of this problem, a lot of pointer tagging schemes that assumed they had 16 bits to themselves on 64-bit machines are in trouble now that normal machines (as of Ice Lake) have five-level paging.

pervognsen, to random
@pervognsen@mastodon.social avatar

If you follow the general recipe for F-algebras, the signature for foldr in a strict/non-lazy language like ML would not be ('a * 'b -> 'b) -> 'b -> 'a list -> 'b but ('a * 'b -> 'b) -> (unit -> 'b) -> 'a list -> 'b. It's probably a moot point in this case since the recursion "never does any useful work" until you hit the end of the list and the unit -> 'b thunk is called.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

It's easiest to see how the difference between thunked and unthunked base values would matter more if you altered the list data type by having two kinds of empty lists so there were two ways to terminate a list.

I suppose the difference for the usual list data type could also be internally observed in an implementation like SML/NJ with callcc.

Anyway, it's a tiny detail I just hadn't noticed before about strict foldr in comparison to the general recipe for F-algebras.

dotstdy, to random
@dotstdy@mastodon.social avatar

There's a bit of stuff in this article which phrases it in terms of changes over time, e.g. compute capability has grown and we no longer need big data. But it seems closer to reality that it was never required, and continues to not be required. (looking forward to the same style of post happening in a few years vis-a-vis microservices)

https://mastodon.social/@ltratt/112518285832831004

pervognsen,
@pervognsen@mastodon.social avatar
pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

When a beverage proclaims "naturally & artificially flavored" on its label in bold letters you know they're not hiding anything. Or maybe they're trying to draw attention away from the supernatural flavoring.

pervognsen,
@pervognsen@mastodon.social avatar

@rygorous I was trying to find a photo of blue soda to post to your thread.

pervognsen,
@pervognsen@mastodon.social avatar

@rygorous When it comes to glow-in-the-dark blue soda, accept no substitutes.

pervognsen,
@pervognsen@mastodon.social avatar

@rygorous @koos303 VP of ideas: energy drink that violates conservation of energy.

pervognsen,
@pervognsen@mastodon.social avatar

@rygorous @koos303 Time to add a section here for beverage applications: https://en.wikipedia.org/wiki/Zero-point_energy#Purported_applications

dotstdy, to random
@dotstdy@mastodon.social avatar

subgroup ops are a mindfuck on top of the existing mindfuck of work groups, local work groups, dispatches and invocations.

pervognsen,
@pervognsen@mastodon.social avatar

@aras @dotstdy Wasted opportunity for Vulkan to coin the term subsubgroup. Or maybe local subgroup. I'd like to see proposals for the worst plausible naming option.

pervognsen, to random
@pervognsen@mastodon.social avatar

There's a bunch of C-like successor languages that say they want to eliminate undefined behavior. I've never been able to figure out how they intend to deal with reads and writes to memory since a lot of these languages take what I would call the "naive" machine-centric view of memory which is hard to reconcile with source-level semantics for variables, etc. You can't really rename all of this stuff as "implementation-defined" and get out of jail for free.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@harold And if you restrict the claim to "no C-style UB" I'd still want to know what that still leaves. Otherwise we're just back to saying the semantics is whatever your compiler happens to do right now. I don't need a full semantics but at least a list of stuff that is definitely UB. And I'd want to know what (as large as possible) subset of the language is definitely UB free. I'm okay with leaving a fuzzy region between these two sets (there's a similar situation with unsafe Rust right now).

pervognsen,
@pervognsen@mastodon.social avatar

@amonakov @harold Hah, I knew what Harold meant! Well, at least what I meant is that that there's a large range between UB minimalism and maximalism when it comes to specification freedom for a language roughly at C's level of abstraction, and C is not very UB minimalist (and hence there's room for C-like languages to be more UB minimalist).

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@foonathan Yeah, the situation seems hopeless, doubly so with data races. Even if you could resolve the sequential case (and it would have to be so pessimizing I doubt anyone who wants to use a C-like language would be satisfied with the outcome) it still wouldn't help you with data races. I'm not even sure how you'd begin with that one. Even if you gave a sequentially consistent semantics to all reads and writes, you wouldn't be able to do any local alias analysis.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@foonathan In the sequential case you can at least perform local alias analysis which is sound if pessimistic (any store whose address cannot be statically alias analyzed is an invalidation barrier) but that isn't available in the concurrent case. I guess you'd have to resort to global alias analysis? That's certainly something you see in high-powered static analysis tools; it can get you further than local analysis, but the programmer-facing runtime performance model seems untenable.

pervognsen, (edited )
@pervognsen@mastodon.social avatar

@foonathan (And this is playing along by assuming that there actually is such a thing as a conservative/pessimistic approach to even the sequential case that can eliminate all of what we normally think of memory/variable/type-related sources of UB.)

pervognsen,
@pervognsen@mastodon.social avatar

@zwarich @foonathan The scare quotes around "UB" are just a perfect example of why I don't want to waste time dealing with people like this.

castano, to random
@castano@mastodon.gamedev.place avatar

Is there a way not to burn quithub's LFS quota on github actions?
I have a repo with GBs of LFS data, and it looks like the checkout action downloads the entire thing every time it's triggered.
This is not only is extremely wasteful, but also gets expensive very quickly!

pervognsen,
@pervognsen@mastodon.social avatar

@wolfpld We got it resolved. :)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • everett
  • osvaldo12
  • magazineikmin
  • thenastyranch
  • rosin
  • normalnudes
  • Youngstown
  • Durango
  • slotface
  • ngwrru68w68
  • kavyap
  • DreamBathrooms
  • tester
  • InstantRegret
  • ethstaker
  • GTA5RPClips
  • tacticalgear
  • Leos
  • anitta
  • modclub
  • khanakhh
  • cubers
  • cisconetworking
  • provamag3
  • megavids
  • lostlight
  • All magazines