pervognsen

@pervognsen@mastodon.social

Performance, compilers, hardware, mathematics, computer science.

I've worked in or adjacent to the video game industry for most of my career.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

dpiponi, 1 hour ago to random

It's a curious coincidence that before the idea of the warp drive there was this definition of warp:

"move (a ship) along by hauling on a rope attached to a stationary object on shore."

Suggests an alternative sci-fi idea for the meaning of "warp drive".

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 4 minutes ago

@dpiponi Imagine if it had been called a weft or woof drive: https://en.wikipedia.org/wiki/Warp_and_weft

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dotstdy, 13 hours ago to random

I feel like the most difficult part of subgroups and GPU programming in general, is getting all the terminology straight in your head. Sometimes it seems like it would be easier just writing rdna asm directly. :')

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

pervognsen, 13 hours ago

@dotstdy You know what has never helped? Every IHV and API having their own incompatible terminology!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

pervognsen, 13 hours ago

@dotstdy It's my favorite too. It used to be fashionable to deride it as marketing wank but I think it's evocative and memorable and the hierarchy of the terms makes sense once you know that warp isn't a sci-fi word.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 13 hours ago

@dotstdy I think the marketing wank accusation is justified when we're counting each separate lane of a SIMD unit as a CUDA core. :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago to random

I haven't done any real Vulkan programming since 1.0. Are there any good guides that skip all the legacy junk and only show the streamlined 1.3 way of doing things?

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@zeux What's the compatibility landscape like for GPUs that support 1.3 but don't support bindless? I was hoping to just require bindless.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@zeux Oh, I just remembered you had your Niagara project. Do you recommend using that as a reference for good practices, etc?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dpiponi, 1 day ago to random

It's not like I had any chance of resisting when one of my favourite books is published in a fancy new hardback edition

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@dpiponi Where are you on Player of Games vs Use of Weapons?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

shriramk, 1 day ago to random

You can't be the financial capital of the world if you can't monetize everything in the news. (Hidden Grounds cafe, NYC.)

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@shriramk I hope you put that $10 in the Kendrick jar.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago to random

Nothing is new: hash consing/value numbering in 1958. On Programming of Arithmetic Operations, A. P. Ershov, https://dl.acm.org/doi/10.1145/368892.368907

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago (edited 1 day ago)

Ershov also independently invents open-addressed linear probing in that short paper although Amdahl, et al, had the idea a few years earlier in 1954.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

Let's also invent the Sethi-Ullman algorithm 12 years early while we're at it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

(I'm not sure how much credit he gets for that. I've always been amused that Sethi-Ullman gets to have a fancy name attached for something so simple and relatively limited in practice. Whereas value numbering/hash consing might be a simple idea but it's extremely powerful and far reaching. But it's nice to see him attack related parts of the problem at once in such a short paper, not just value numbering but instruction scheduling and register allocation, since they all affect each other.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago to random

One of my favorite hip-hop instrumentals: https://www.youtube.com/watch?v=s6Yyb3N9IuA. I was listening to J Cole's Everybody Dies and a YouTube commenter had just written "Kenny Dope" without any further context or explanation and I immediately understood what it meant.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

If you don't get the reference, listen to the two tracks back to back: https://www.youtube.com/watch?v=-5slZHLSnow. They both sample https://en.wikipedia.org/wiki/Inside_My_Love.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lritter, 1 day ago to random

interesting problem: progressively mapping a cosmically high number of unique strings of arbitrary length to an ordered set so that we can assign an index to each string, extract a substring from each index, and filter strings not in the set.

evidently, this approach requires compression. the compressed result is functionally equivalent to a regular expression, or a schema validation system.

reply

expand (13)

collapse (13)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@lritter You didn't define everything to the point where I'm completely sure what you're describing but maybe https://blog.burntsushi.net/transducers/ is relevant.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@lritter Alright, I thought you were talking about strings-strings. Carry on. :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@lritter Everything is hash consing. https://en.wikipedia.org/wiki/Binary_decision_diagram

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@lritter Definitely one of the best simple ideas in CS.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@lritter Yeah, that's why I said it's all hash consing. It's very general and goes at least as far as back as a Russian paper in the early 60s on value numbering.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 day ago

@lritter My bad, make that late 50s. https://dl.acm.org/doi/10.1145/368892.368907. Although I remember the terminology in that paper being somewhat impenetrable and the generality not so immediately apparent.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

steve, 3 days ago to random

A slight re-organization of Priest's "Efficient Scaling for Complex Division" to make it compatible with "try to divide the dumb fast way inline, then branch to rescale only if necessary" while preserving scale invariance of rounding.

Also fixes it up to work for Float16, which the original approach does not.

Further optimization possible and pretty straightforward.

https://github.com/apple/swift-numerics/pull/289

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 3 days ago

@saagar @steve @neilhenning That's affine algebra. Linear algebra is when you're stuck at y=mx.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

amonakov, 5 days ago to random

(prompted by discussion of detecting bitwise and-not earlier in GCC's optimization pipeline)

My ideal compiler IR would not have and/or/xor as distinct bitwise ops, just generic ternlog and probably the corresponding two-operand function ("bilog"?) too.

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 13 hours ago

@amonakov For a reason I don't fully understand, this seems to be common in GPUs but not in CPUs. Even the VPTERNLOG instructions in AVX-512 were inherited from Larrabee AFAIK. Maybe GPU ISAs are less averse to many-operand instructions than CPU ISAs have traditionally been?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 12 hours ago

@amonakov Hmm, I just remembered Southern Islands had a metric truckload of 3 in, 1 out instructions and thought I remembered ternlog being in there. But looking through the ISA manual now I can't find it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 hour ago

@rygorous @amonakov What about the increased RF/operand forwarding port pressure from three input operands? Don't GPU cores usually have some additional tricks they can play with RF ports due to their latency-tolerant design? Does this figure into the CPU vs GPU difference at all?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...