@pervognsen@mastodon.social avatar

pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

dpiponi, to random
@dpiponi@mathstodon.xyz avatar

It's a curious coincidence that before the idea of the warp drive there was this definition of warp:

"move (a ship) along by hauling on a rope attached to a stationary object on shore."

Suggests an alternative sci-fi idea for the meaning of "warp drive".

pervognsen,
@pervognsen@mastodon.social avatar

@dpiponi Imagine if it had been called a weft or woof drive: https://en.wikipedia.org/wiki/Warp_and_weft

dotstdy, to random
@dotstdy@mastodon.social avatar

I feel like the most difficult part of subgroups and GPU programming in general, is getting all the terminology straight in your head. Sometimes it seems like it would be easier just writing rdna asm directly. :')

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy You know what has never helped? Every IHV and API having their own incompatible terminology!

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy It's my favorite too. It used to be fashionable to deride it as marketing wank but I think it's evocative and memorable and the hierarchy of the terms makes sense once you know that warp isn't a sci-fi word.

pervognsen,
@pervognsen@mastodon.social avatar

@dotstdy I think the marketing wank accusation is justified when we're counting each separate lane of a SIMD unit as a CUDA core. :)

pervognsen, to random
@pervognsen@mastodon.social avatar

I haven't done any real Vulkan programming since 1.0. Are there any good guides that skip all the legacy junk and only show the streamlined 1.3 way of doing things?

pervognsen,
@pervognsen@mastodon.social avatar

@zeux What's the compatibility landscape like for GPUs that support 1.3 but don't support bindless? I was hoping to just require bindless.

pervognsen,
@pervognsen@mastodon.social avatar

@zeux Oh, I just remembered you had your Niagara project. Do you recommend using that as a reference for good practices, etc?

dpiponi, to random
@dpiponi@mathstodon.xyz avatar

It's not like I had any chance of resisting when one of my favourite books is published in a fancy new hardback edition

pervognsen,
@pervognsen@mastodon.social avatar

@dpiponi Where are you on Player of Games vs Use of Weapons?

shriramk, to random
@shriramk@mastodon.social avatar

You can't be the financial capital of the world if you can't monetize everything in the news. (Hidden Grounds cafe, NYC.)

pervognsen,
@pervognsen@mastodon.social avatar

@shriramk I hope you put that $10 in the Kendrick jar.

pervognsen, to random
@pervognsen@mastodon.social avatar

Nothing is new: hash consing/value numbering in 1958. On Programming of Arithmetic Operations, A. P. Ershov, https://dl.acm.org/doi/10.1145/368892.368907

pervognsen, (edited )
@pervognsen@mastodon.social avatar

Ershov also independently invents open-addressed linear probing in that short paper although Amdahl, et al, had the idea a few years earlier in 1954.

pervognsen,
@pervognsen@mastodon.social avatar

Let's also invent the Sethi-Ullman algorithm 12 years early while we're at it.

pervognsen,
@pervognsen@mastodon.social avatar

(I'm not sure how much credit he gets for that. I've always been amused that Sethi-Ullman gets to have a fancy name attached for something so simple and relatively limited in practice. Whereas value numbering/hash consing might be a simple idea but it's extremely powerful and far reaching. But it's nice to see him attack related parts of the problem at once in such a short paper, not just value numbering but instruction scheduling and register allocation, since they all affect each other.)

pervognsen, to random
@pervognsen@mastodon.social avatar

One of my favorite hip-hop instrumentals: https://www.youtube.com/watch?v=s6Yyb3N9IuA. I was listening to J Cole's Everybody Dies and a YouTube commenter had just written "Kenny Dope" without any further context or explanation and I immediately understood what it meant.

pervognsen,
@pervognsen@mastodon.social avatar

If you don't get the reference, listen to the two tracks back to back: https://www.youtube.com/watch?v=-5slZHLSnow. They both sample https://en.wikipedia.org/wiki/Inside_My_Love.

lritter, to random
@lritter@mastodon.gamedev.place avatar

interesting problem: progressively mapping a cosmically high number of unique strings of arbitrary length to an ordered set so that we can assign an index to each string, extract a substring from each index, and filter strings not in the set.

evidently, this approach requires compression. the compressed result is functionally equivalent to a regular expression, or a schema validation system.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter You didn't define everything to the point where I'm completely sure what you're describing but maybe https://blog.burntsushi.net/transducers/ is relevant.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter Alright, I thought you were talking about strings-strings. Carry on. :)

pervognsen,
@pervognsen@mastodon.social avatar
pervognsen,
@pervognsen@mastodon.social avatar

@lritter Definitely one of the best simple ideas in CS.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter Yeah, that's why I said it's all hash consing. It's very general and goes at least as far as back as a Russian paper in the early 60s on value numbering.

pervognsen,
@pervognsen@mastodon.social avatar

@lritter My bad, make that late 50s. https://dl.acm.org/doi/10.1145/368892.368907. Although I remember the terminology in that paper being somewhat impenetrable and the generality not so immediately apparent.

steve, to random
@steve@discuss.systems avatar

A slight re-organization of Priest's "Efficient Scaling for Complex Division" to make it compatible with "try to divide the dumb fast way inline, then branch to rescale only if necessary" while preserving scale invariance of rounding.

Also fixes it up to work for Float16, which the original approach does not.

Further optimization possible and pretty straightforward.

https://github.com/apple/swift-numerics/pull/289

pervognsen,
@pervognsen@mastodon.social avatar

@saagar @steve @neilhenning That's affine algebra. Linear algebra is when you're stuck at y=mx.

amonakov, to random
@amonakov@mastodon.gamedev.place avatar

(prompted by discussion of detecting bitwise and-not earlier in GCC's optimization pipeline)

My ideal compiler IR would not have and/or/xor as distinct bitwise ops, just generic ternlog and probably the corresponding two-operand function ("bilog"?) too.

pervognsen,
@pervognsen@mastodon.social avatar

@amonakov For a reason I don't fully understand, this seems to be common in GPUs but not in CPUs. Even the VPTERNLOG instructions in AVX-512 were inherited from Larrabee AFAIK. Maybe GPU ISAs are less averse to many-operand instructions than CPU ISAs have traditionally been?

pervognsen,
@pervognsen@mastodon.social avatar

@amonakov Hmm, I just remembered Southern Islands had a metric truckload of 3 in, 1 out instructions and thought I remembered ternlog being in there. But looking through the ISA manual now I can't find it.

pervognsen,
@pervognsen@mastodon.social avatar

@rygorous @amonakov What about the increased RF/operand forwarding port pressure from three input operands? Don't GPU cores usually have some additional tricks they can play with RF ports due to their latency-tolerant design? Does this figure into the CPU vs GPU difference at all?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • InstantRegret
  • rosin
  • modclub
  • Youngstown
  • khanakhh
  • Durango
  • slotface
  • mdbf
  • cubers
  • GTA5RPClips
  • kavyap
  • DreamBathrooms
  • ngwrru68w68
  • provamag3
  • magazineikmin
  • osvaldo12
  • tester
  • tacticalgear
  • ethstaker
  • Leos
  • thenastyranch
  • everett
  • normalnudes
  • anitta
  • megavids
  • cisconetworking
  • lostlight
  • All magazines