@artificialmind@fosstodon.org avatar

artificialmind

@artificialmind@fosstodon.org

C++ library author, hobbyist programming language architect, obsessive optimizer

This profile is from a federated server and may be incomplete. Browse more on the original instance.

pervognsen, to random
@pervognsen@mastodon.social avatar

I've never fully worked out how best to articulate my dissatisfaction with the usual way people talk about pluggable allocators in systems programming. Sure, I'd like to have some standard for fallible, pluggable allocation at the lower level of a language's standard library. But the entire mindset of plugging together allocators and data structures is something I find dubious and at best it feels like a poor compromise.

artificialmind,
@artificialmind@fosstodon.org avatar

@zeux @pervognsen counter shout-out: people who write slow & unmaintainable mess because they cargo-cult avoid unordered_map or any kind of list data structure because they saw an ankerl benchmark once.

artificialmind,
@artificialmind@fosstodon.org avatar

@zeux @pervognsen The first thing was my main point though: you cannot "fix" std::unordered_map with a custom allocator. But neither does that imply that you can fix your code by blindly using "benchmark-leading maps". Anyone with half an experience in optimization sees the issue with tons of small std::vectors. My issue is with blindly applying "performance tricks" (the cargo culting part) without the necessary understanding. I feel your comment goes in the same direction.

artificialmind,
@artificialmind@fosstodon.org avatar

@zeux @pervognsen At work we were building mesh topology and for the "vertex -> adjacent vertices" mapping I replaced a "ankerl::unordered_dense::map<int, std::vector<int>>" with an array-embedded per-vertex linked list to get something like 4x to 10x speedup in that section.

I don't have a ready example for the maintainability part but basically when you semantically need a map and you're not in any kind of hot path, then std::unordered_map (and it's stability guarantee) is totally fine.

dotstdy, to random
@dotstdy@mastodon.social avatar

CPU optimisation guide: You should try vectorizing
GPU optimisation guide: You should try scalarizing

CHOOSE A LANE

artificialmind,
@artificialmind@fosstodon.org avatar

@dotstdy @aras so what's next? running lisp on an intel CPU and running x86 asm on a lisp machine?

pervognsen, (edited ) to random
@pervognsen@mastodon.social avatar

It looks a bit funny but Rc<Arc<T>> seems like a reasonable choice in a lot of cases. Specifically, you have locally shared ownership of a remotely shared resource instead of directly sharing ownership of the remote resource (which comes with contention issues). Most of the time you probably wouldn't literally have Rc<Arc<T>> but Rc<LocalStruct> where LocalStruct (transitively) has an Arc<T>. But same thing really.

artificialmind,
@artificialmind@fosstodon.org avatar

@pervognsen @SonnyBonds Yep and it was (at least in my perception) always advertised and taught as "the modern/idiomatic way" when using smart pointers. std::make_shared also has some exception-safety benefits where a throwing ctor doesn't lead to leaking memory.

The only real "downside" with shared allocation is that weak pointers can keep the data allocation alive even if the data itself is not accessible anymore. But I haven't encountered that issue in real code yet.

artificialmind,
@artificialmind@fosstodon.org avatar

@pervognsen @SonnyBonds they are two pointers but if you use std::make_shared (which is the idiomatic way nowadays), then it only does a single allocation where control and data block are adjacent.

cliffle, to random
@cliffle@hachyderm.io avatar

How I feel listening to programmers concerned that floating point math is non-commutative

https://www.smbc-comics.com/comic/commute-2

artificialmind,
@artificialmind@fosstodon.org avatar

@pkhuong @cliffle yeah at first I thought it's the usual confusion: floating point math is unexpectedly non-associative, not non-commutative. The comic is clearly about commutativity though so now I'm not sure.

foonathan, to random
@foonathan@fosstodon.org avatar

Small tip for new speakers:

Don't end your talk with "are there any questions?" because then you get an awkward silence, then people realize it's over and start applauding, and then questions.

End it with "thank you for listening", wait for applause, then ask for questions.

artificialmind,
@artificialmind@fosstodon.org avatar

@foonathan ideally the last slide shows some summary of your talk, visual if applicable. You can add a "thank you" there if you really want but it's important that the "freeze frame" slide at the end makes it easy for people to remember your whole talk at a glance, not be a blank canvas.

artificialmind, to random
@artificialmind@fosstodon.org avatar

Does anyone have a good resource for rationale and design decisions in IEEE 754 floating point? I'm preparing a blog post about tradeoffs in floating point design from an API perspective.

For example: why would you include +-Inf and not simply fold those into NaN? One important reason is interval arithmetic. But most "normal" float code I've seen doesn't handle Inf much better than NaN.

artificialmind,
@artificialmind@fosstodon.org avatar

@lesley So I'm decidedly not looking for "how do IEEE 754 floats work" but "why do they work that way". The design space is huge! So why did we end up here?

1/+0 = +Inf, 1/-0 = -Inf is nice for intervals but it breaks so many other intuitions. Like "a == b" implies "c / a == c / b".

fasterthanlime, to random
@fasterthanlime@hachyderm.io avatar

me: so remember when you got mad at me 14 years ago?
friend: ...I have zero memories of that happening
me: haha okay!! good thing it didn't haunt me this whole time, informing every single interaction I've had with you since!
friend: (≖_≖ )

artificialmind,
@artificialmind@fosstodon.org avatar

@fasterthanlime Sometimes I'd really like to be able to ask people questions and get their answer without them remembering that I asked the question.

Like, I dunno, side-effect free communication?

regehr, to random
@regehr@mastodon.social avatar

TIL that in IEEE FP, -0.0 + +0.0 = +0.0

https://alive2.llvm.org/ce/z/jPi8ZV

artificialmind,
@artificialmind@fosstodon.org avatar

@regehr I wrote about that a few years ago: https://artificial-mind.net/blog/2019/08/09/floating-point-optimizations it basically implies that the compiler cannot optimize "x + 0.0" to "x".

On the flip side, I use "x + 0.0" before hashing the bits of a float to make sure +0 and -0 have the same hash.

jrose, to random
@jrose@belkadan.com avatar

The world before stacks is a fun bit of software history that for some reason I’m always a bit surprised people don’t know. Of course, not knowing something is the default state and all, and it’s nearly never relevant—even retrocomputing projects rarely go back that far. But it is a glimpse into the kind of thinking people cone up with when using their existing tools to implement a new pattern—and that new pattern was “subroutines”. (The stack itself is a version of that too, where the new pattern is “recursion”.)

https://devblogs.microsoft.com/oldnewthing/20240401-00/?p=109599

artificialmind,
@artificialmind@fosstodon.org avatar

@foonathan @jrose If you have concurrency, that memory has to be in TLS.

Now I'm wondering: your compiler could statically mark a lot of functions as "non-recursive", based on a conservative call graph. Would it make sense to statically allocate their "stack space" or is this not actually faster?

lesley, to random
@lesley@mastodon.gamedev.place avatar

Not sure if I agree with this take (I am more in Rust's "limit type inference" camp), but this feels like a blog full of gems (that I somehow missed)

https://borretti.me/article/type-inference-was-a-mistake

artificialmind,
@artificialmind@fosstodon.org avatar

@foonathan @lesley ah thanks! A bit like how Voldemort types only become actually unspellable without a decltype mechanism. Still, the actual argument feels a bit brittle because you could obviously have lots of type inference but simply refuse to infer lifetimes. But at least I understand how it plays into their hand in this instance.

artificialmind,
@artificialmind@fosstodon.org avatar

@foonathan @lesley I would have thought that linear types are the main reason for their memory safety, how does type inference play into that?

lesley, to random
@lesley@mastodon.gamedev.place avatar

This is a topic that I have wondered about a few times, and I am glad someone laid it out. I like the "relational" approach (also strongly related to data-oriented programming)

https://btmc.substack.com/p/how-to-store-types-after-semantic

artificialmind,
@artificialmind@fosstodon.org avatar

@lesley the last point is a minor super power: half the LSP features (completion, hover, go-to def, highlight symbols, rename, and a few others) are almost trivial to implement using this simple list.

It's emitted during compilation so it will stay in sync with the compiler. The actual LSP requests only need a few lines and do not traverse an AST. It's probably the most bang-for-buck thing in my current architecture.

artificialmind,
@artificialmind@fosstodon.org avatar

@foonathan @lesley Sure! (no micro-opt yet, so "owning rust" + little interning)

artificialmind,
@artificialmind@fosstodon.org avatar

@lesley my compiler ended up close to relational and typed. There is no typed AST though, because I go from AST directly to typed control flow graph (I have flow dependent typing and name resolution and that's not logic I want to duplicate. So the first typed+name-resolved IR is already a control flow graph).

I have a relational aspect in there as well: my compile emits a flat list of "typed tokens" with IDs of their declarations and the expected type during compilation.

zwarich, to random
@zwarich@hachyderm.io avatar

@chandlerc I’m following Carbon pretty closely, and hope that it succeeds so there are fewer and fewer contexts in which I could be asked to write C++.

However, I’m a bit puzzled by the safety story. What makes you confident that you can focus on adding safety after the fact and succeed rather than building it in from the beginning?

artificialmind,
@artificialmind@fosstodon.org avatar

@chandlerc @zwarich One of the large "choices" I see in rust-style ownership is choosing the default. Rust went with move-by-default and explicit references. I'm experimenting with references-by-default and explicit moves/copies. Not sure how that plays out yet but the feel is quite different.

aras, to random
@aras@mastodon.gamedev.place avatar

Ever get a feeling like you spend a week reading upon and trying out various clever solutions to a problem, and they are all complex and messy? And then do the stupidest simple thing possible in an hour instead, and it actually works well?

Yeah, me neither. 😭

artificialmind,
@artificialmind@fosstodon.org avatar

@aras to be fair, these things are quite often in Pascal's "If I Had More Time, I Would Have Written a Shorter Letter" kind of space.

I'm pretty sure software engineer skill progression is: simple thing that doesn't work, complex thing that does work, simple thing that does work.

artificialmind,
@artificialmind@fosstodon.org avatar

@aras and the elusive Tier 4: no simple thing works so it has to be a complex solution. Much heated discussion is in Tier 2 situations where people think they are in a Tier 4 scenario. The other way also exists: folks defending a Tier 4 solution against people thinking they have to shut down a Tier 2 solution against their Tier 3 while missing their proposal is Tier 1.

Ok that got confusing fast. Good Tier 4 examples are Unicode and Timezones.

artificialmind, to random
@artificialmind@fosstodon.org avatar

traditional compiler and language design starts with:

tokenization -> parsing -> semantics -> ...

radical idea:

blockification -> tokenization -> parsing -> ...

structure source code into nested blocks first, THEN start tokenization.

artificialmind, to random
@artificialmind@fosstodon.org avatar

In C++, is there a way to "restrict" a span of bytes?

Consider https://godbolt.org/z/xhKr73761 : "obj const&" means the fields are reloaded every loop. "obj const& restrict" allows the compiler to move the loads outside the loop. However, I could not get my obj_view (prototypical reference type on top of a byte span) to have the same optimization. Basically I want a "this won't change anymore" annotation.

(@foonathan you don't happen to know a trick or two for this?)

artificialmind, to random
@artificialmind@fosstodon.org avatar

struct non_comp { };
std::map<int, non_comp> vals;
if (vals == vals) { ... }

will not compile because non_comp has no op==.

but you cannot detect this using SFINAE on decltype(vals), because map always has op== and only fails to compile internally.

@foonathan you don't happen to know a way around this, do you? :D"

artificialmind, to random
@artificialmind@fosstodon.org avatar

Shower thought: I you have a language with explicitly sized primitives (i32, u64, f32, etc.) and you want to provide "int" as "reasonably sized default", then 52 bit with UB overflow might be interesting:

  • can be transparently compiled to i64 or f64 (whatever makes locally more sense)
  • can be stored in GPR or SSE registers exactly
  • is still sufficiently large to represent buffer sizes
  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • tacticalgear
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • lostlight
  • All magazines