artificialmind

@artificialmind@fosstodon.org

C++ library author, hobbyist programming language architect, obsessive optimizer

This profile is from a federated server and may be incomplete. Browse more on the original instance.

pervognsen, 22 days ago to random

I've never fully worked out how best to articulate my dissatisfaction with the usual way people talk about pluggable allocators in systems programming. Sure, I'd like to have some standard for fallible, pluggable allocation at the lower level of a language's standard library. But the entire mindset of plugging together allocators and data structures is something I find dubious and at best it feels like a poor compromise.

reply

expand (26)

collapse (26)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 22 days ago

@zeux @pervognsen counter shout-out: people who write slow & unmaintainable mess because they cargo-cult avoid unordered_map or any kind of list data structure because they saw an ankerl benchmark once.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 22 days ago

@zeux @pervognsen The first thing was my main point though: you cannot "fix" std::unordered_map with a custom allocator. But neither does that imply that you can fix your code by blindly using "benchmark-leading maps". Anyone with half an experience in optimization sees the issue with tons of small std::vectors. My issue is with blindly applying "performance tricks" (the cargo culting part) without the necessary understanding. I feel your comment goes in the same direction.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 22 days ago

@zeux @pervognsen At work we were building mesh topology and for the "vertex -> adjacent vertices" mapping I replaced a "ankerl::unordered_dense::map<int, std::vector<int>>" with an array-embedded per-vertex linked list to get something like 4x to 10x speedup in that section.

I don't have a ready example for the maintainability part but basically when you semantically need a map and you're not in any kind of hot path, then std::unordered_map (and it's stability guarantee) is totally fine.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

dotstdy, 24 days ago to random

CPU optimisation guide: You should try vectorizing
GPU optimisation guide: You should try scalarizing

CHOOSE A LANE

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

artificialmind, 24 days ago

@dotstdy @aras so what's next? running lisp on an intel CPU and running x86 asm on a lisp machine?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pervognsen, 1 month ago (edited 1 month ago) to random

It looks a bit funny but Rc<Arc<T>> seems like a reasonable choice in a lot of cases. Specifically, you have locally shared ownership of a remotely shared resource instead of directly sharing ownership of the remote resource (which comes with contention issues). Most of the time you probably wouldn't literally have Rc<Arc<T>> but Rc<LocalStruct> where LocalStruct (transitively) has an Arc<T>. But same thing really.

reply

expand (19)

collapse (19)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 1 month ago

@pervognsen @SonnyBonds Yep and it was (at least in my perception) always advertised and taught as "the modern/idiomatic way" when using smart pointers. std::make_shared also has some exception-safety benefits where a throwing ctor doesn't lead to leaking memory.

The only real "downside" with shared allocation is that weak pointers can keep the data allocation alive even if the data itself is not accessible anymore. But I haven't encountered that issue in real code yet.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 1 month ago

@pervognsen @SonnyBonds they are two pointers but if you use std::make_shared (which is the idiomatic way nowadays), then it only does a single allocation where control and data block are adjacent.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

cliffle, 1 month ago to random

How I feel listening to programmers concerned that floating point math is non-commutative

https://www.smbc-comics.com/comic/commute-2

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Paxxi, demofox, whitequark, danderson

artificialmind, 1 month ago

@pkhuong @cliffle yeah at first I thought it's the usual confusion: floating point math is unexpectedly non-associative, not non-commutative. The comic is clearly about commutativity though so now I'm not sure.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

foonathan, 1 month ago to random

Small tip for new speakers:

Don't end your talk with "are there any questions?" because then you get an awkward silence, then people realize it's over and start applauding, and then questions.

End it with "thank you for listening", wait for applause, then ask for questions.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bluGill

artificialmind, 1 month ago

@foonathan ideally the last slide shows some summary of your talk, visual if applicable. You can add a "thank you" there if you really want but it's important that the "freeze frame" slide at the end makes it easy for people to remember your whole talk at a glance, not be a blank canvas.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 2 months ago to random

Does anyone have a good resource for rationale and design decisions in IEEE 754 floating point? I'm preparing a blog post about tradeoffs in floating point design from an API perspective.

For example: why would you include +-Inf and not simply fold those into NaN? One important reason is interval arithmetic. But most "normal" float code I've seen doesn't handle Inf much better than NaN.

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 2 months ago

@lesley So I'm decidedly not looking for "how do IEEE 754 floats work" but "why do they work that way". The design space is huge! So why did we end up here?

1/+0 = +Inf, 1/-0 = -Inf is nice for intervals but it breaks so many other intuitions. Like "a == b" implies "c / a == c / b".

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

fasterthanlime, 2 months ago to random

me: so remember when you got mad at me 14 years ago?
friend: ...I have zero memories of that happening
me: haha okay!! good thing it didn't haunt me this whole time, informing every single interaction I've had with you since!
friend: (≖_≖ )

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ljrk

artificialmind, 2 months ago

@fasterthanlime Sometimes I'd really like to be able to ask people questions and get their answer without them remembering that I asked the question.

Like, I dunno, side-effect free communication?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

regehr, 2 months ago to random

TIL that in IEEE FP, -0.0 + +0.0 = +0.0

https://alive2.llvm.org/ce/z/jPi8ZV

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 2 months ago

@regehr I wrote about that a few years ago: https://artificial-mind.net/blog/2019/08/09/floating-point-optimizations it basically implies that the compiler cannot optimize "x + 0.0" to "x".

On the flip side, I use "x + 0.0" before hashing the bits of a float to make sure +0 and -0 have the same hash.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ slink

jrose, 2 months ago to random

The world before stacks is a fun bit of software history that for some reason I’m always a bit surprised people don’t know. Of course, not knowing something is the default state and all, and it’s nearly never relevant—even retrocomputing projects rarely go back that far. But it is a glimpse into the kind of thinking people cone up with when using their existing tools to implement a new pattern—and that new pattern was “subroutines”. (The stack itself is a version of that too, where the new pattern is “recursion”.)

https://devblogs.microsoft.com/oldnewthing/20240401-00/?p=109599

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 2 months ago

@foonathan @jrose If you have concurrency, that memory has to be in TLS.

Now I'm wondering: your compiler could statically mark a lot of functions as "non-recursive", based on a conservative call graph. Would it make sense to statically allocate their "stack space" or is this not actually faster?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lesley, 2 months ago to random

Not sure if I agree with this take (I am more in Rust's "limit type inference" camp), but this feels like a blog full of gems (that I somehow missed)

https://borretti.me/article/type-inference-was-a-mistake

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 2 months ago

@foonathan @lesley ah thanks! A bit like how Voldemort types only become actually unspellable without a decltype mechanism. Still, the actual argument feels a bit brittle because you could obviously have lots of type inference but simply refuse to infer lifetimes. But at least I understand how it plays into their hand in this instance.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 2 months ago

@foonathan @lesley I would have thought that linear types are the main reason for their memory safety, how does type inference play into that?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lesley, 3 months ago to random

This is a topic that I have wondered about a few times, and I am glad someone laid it out. I like the "relational" approach (also strongly related to data-oriented programming)

https://btmc.substack.com/p/how-to-store-types-after-semantic

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 3 months ago

@lesley the last point is a minor super power: half the LSP features (completion, hover, go-to def, highlight symbols, rename, and a few others) are almost trivial to implement using this simple list.

It's emitted during compilation so it will stay in sync with the compiler. The actual LSP requests only need a few lines and do not traverse an AST. It's probably the most bang-for-buck thing in my current architecture.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 3 months ago

@foonathan @lesley Sure! (no micro-opt yet, so "owning rust" + little interning)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 3 months ago

@lesley my compiler ended up close to relational and typed. There is no typed AST though, because I go from AST directly to typed control flow graph (I have flow dependent typing and name resolution and that's not logic I want to duplicate. So the first typed+name-resolved IR is already a control flow graph).

I have a relational aspect in there as well: my compile emits a flat list of "typed tokens" with IDs of their declarations and the expected type during compilation.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

zwarich, 3 months ago to random

@chandlerc I’m following Carbon pretty closely, and hope that it succeeds so there are fewer and fewer contexts in which I could be asked to write C++.

However, I’m a bit puzzled by the safety story. What makes you confident that you can focus on adding safety after the fact and succeed rather than building it in from the beginning?

reply

expand (10)

collapse (10)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 3 months ago

@chandlerc @zwarich One of the large "choices" I see in rust-style ownership is choosing the default. Rust went with move-by-default and explicit references. I'm experimenting with references-by-default and explicit moves/copies. Not sure how that plays out yet but the feel is quite different.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

aras, 4 months ago to random

Ever get a feeling like you spend a week reading upon and trying out various clever solutions to a problem, and they are all complex and messy? And then do the stupidest simple thing possible in an hour instead, and it actually works well?

Yeah, me neither. 😭

reply

expand (15)

collapse (15)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

artificialmind, 4 months ago

@aras to be fair, these things are quite often in Pascal's "If I Had More Time, I Would Have Written a Shorter Letter" kind of space.

I'm pretty sure software engineer skill progression is: simple thing that doesn't work, complex thing that does work, simple thing that does work.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto, aras

artificialmind, 4 months ago

@aras and the elusive Tier 4: no simple thing works so it has to be a complex solution. Much heated discussion is in Tier 2 situations where people think they are in a Tier 4 scenario. The other way also exists: folks defending a Tier 4 solution against people thinking they have to shut down a Tier 2 solution against their Tier 3 while missing their proposal is Tier 1.

Ok that got confusing fast. Good Tier 4 examples are Unicode and Timezones.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dneto, aras

artificialmind, 9 months ago to random

traditional compiler and language design starts with:

tokenization -> parsing -> semantics -> ...

radical idea:

blockification -> tokenization -> parsing -> ...

structure source code into nested blocks first, THEN start tokenization.

reply

expand (13)

collapse (13)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ foonathan

artificialmind, 10 months ago to random

In C++, is there a way to "restrict" a span of bytes?

Consider https://godbolt.org/z/xhKr73761 : "obj const&" means the fields are reloaded every loop. "obj const& restrict" allows the compiler to move the loads outside the loop. However, I could not get my obj_view (prototypical reference type on top of a byte span) to have the same optimization. Basically I want a "this won't change anymore" annotation.

(@foonathan you don't happen to know a trick or two for this?)

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 10 months ago to random

struct non_comp { };
std::map<int, non_comp> vals;
if (vals == vals) { ... }

will not compile because non_comp has no op==.

but you cannot detect this using SFINAE on decltype(vals), because map always has op== and only fails to compile internally.

@foonathan you don't happen to know a way around this, do you? :D"

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

artificialmind, 10 months ago to random

Shower thought: I you have a language with explicitly sized primitives (i32, u64, f32, etc.) and you want to provide "int" as "reasonably sized default", then 52 bit with UB overflow might be interesting:

can be transparently compiled to i64 or f64 (whatever makes locally more sense)

can be stored in GPR or SSE registers exactly

is still sufficiently large to represent buffer sizes

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ foonathan