@bugaevc@floss.social avatar

bugaevc

@bugaevc@floss.social

Unix hacker. I do obscure and cursed things.

I hack on Darling, SerenityOS / Ladybird, GNU Hurd / glibc, wl-clipboard, Owl, etc.

I use GNOME, and contribute to freedesktop / GNOME projects sometimes (systemd, PipeWire, GLib, GTK, etc).

I like Rust and dislike Docker.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

bugaevc, to random
@bugaevc@floss.social avatar

Let's look into virtualization on ARM

🧵

bugaevc,
@bugaevc@floss.social avatar

So, ARM has ELs (exception levels), somewhat like protection rings of yore.

EL0 is where the userland runs
EL1 is where the kernel runs
EL2 (if present) is where a hypervisor runs
EL3 (if present) is where a vendor backdoor / DRM enforcement crap, also called "secure monitor", runs

bugaevc,
@bugaevc@floss.social avatar

EL2's programming model is pretty different from that of EL1. The control registers are different, the way address space layout works is different, the format of translation tables (aka page tables) is different, and so on. That's because when EL2 was introduced (under the name "hyp"), it was designed to run special-purpose type-1 hypervisors (like Xen), and not OS kernels, which kind of makes sense.

bugaevc, (edited )
@bugaevc@floss.social avatar

But, people don't (only) want Xen, people want to run KVM and Hypervisor.framework and Hyper-V. And for that to work, the kernel (or at least some part of it) has to run at EL2.

So the original solution to this was to implement a "low-visor", a small part of the kernel that runs at EL2. Most of the kernel still runs at EL1, and next to it, as its siblings, run the VMs. The low-visor obeys commands from the host kernel, but not from the guests.

bugaevc,
@bugaevc@floss.social avatar

But that of course wasn't very nice, or performant. So, in ARM v8.1 (and AArch64 was added in v8.0, so your "arm64" kernel cannot use 8.1 features unconditionally), ARM decided to rectify this situation, and introduced Virtualization Host Extensions, aka The Only Sane Way That Things Should Have Been From The Start (but TOSWTTSHBFTS doesn't sound as catchy as VHE, does it).

bugaevc,
@bugaevc@floss.social avatar

What VHE does is it adds a single bit to the HCR_EL2 register, the E2H bit. When you turn E2H on, EL2 magically turns into an almost exact copy of EL1 — it has all the same registers, all the same formats, and all the same address space layout.

bugaevc,
@bugaevc@floss.social avatar

With that, you could almost run an existing OS kernel at EL2 unmodified — if you somehow make it access the EL2 versions of the system registers and not the EL1 ones. One way to do that would be modifying the kernel, to do either a compile-time (i.e. CONFIG_EL2) or run-time (i.e. CurrentEL() == 2) checks. But nobody wants to have separate kernel builds just for this, nor would anyone like the runtime cost of all the checks (Linux would patch it out with their static branch mechanism, but still).

bugaevc,
@bugaevc@floss.social avatar

So guess what ARM did? That's right, they made it so that when E2H is on, when you access EL1 registers, it actually implicitly refers to EL2 registers. And to actually refer to EL1 registers, there's a new set of "EL12" register mnemonics.

(Did I say VHE was sane? Scratch that)

bugaevc,
@bugaevc@floss.social avatar

So with that, we can indeed run an otherwise unmodified Linux (or GNU Mach 🙂) in EL2, and have it directly manage both regular processes (in EL0) and VMs (in EL1 & EL0) without any need for a low-visor. And with interrupt virtualization (which is built in GIC functionality, though I haven't figured out the details yet) and some timers you can actually make something run in those VMs. Cool.

bugaevc,
@bugaevc@floss.social avatar

But then, turns out people don't just want to run KVM. People want to run KVM inside KVM! For instance, people want to run QEMU in their freaking GitHub Actions, and want it to not be terribly slow.

So we need nested virtualization.

bugaevc,
@bugaevc@floss.social avatar

So what did ARM do, did they introduce an "EL2.5" to run the outer hypervisor? Thankfully, no.

They added another two bits to HCR_EL2, NV and NV1 (for "nested virtualization"). Setting the NV bit makes it so that:

  • When EL1 attempts to execute various EL2 operations, they trap to EL2, not back to EL1
  • Traps attempts to return from EL1 (to EL1 or EL0) to EL2
  • Makes CurrentEL read 2 (well, 4, since it's << 2) in EL1
bugaevc,
@bugaevc@floss.social avatar

The idea here is that you'd run the nested hypervisor in EL1 as well, alongside its nested guest. Because of the NV bit, to the nested hypervisor it would look like it's running in EL2.

bugaevc,
@bugaevc@floss.social avatar

Whatever it tries to do that can only be done in EL2 will be trapped to the real EL2, where the outer hypervisor would emulate the effect. When the nested hypervisor attempts to return to what it thinks is EL1 (where its nested guest runs), that would be trapped to the outer hypervisor too, and it would switch the registers around and unset NV, to make EL1 look like EL1 again.

bugaevc,
@bugaevc@floss.social avatar

What NV1 does, I honestly don't understand. It just seems to enable some more traps and tweak some other things here and there. If anybody understands what NV1 is for, please let me know!

bugaevc,
@bugaevc@floss.social avatar

But, that way of doing nested virtualization turned out to be overly slow still.

Indeed, when running in EL2-that-is-actually-secretly-EL1, the nested hypervisor, naturally, frequently accesses system registers that would control either execution at EL2 (itself) or EL1 (its guest). Both must be trapped to the real EL2 and emulated:

bugaevc,
@bugaevc@floss.social avatar

the former ones applied to the real EL1 (so the nested hypervisor sees the effect immediately, as it expects to), and the latter ones saved for later, for when we switch from EL1-that-pretends-to-be-EL2 to EL1-that-is-just-EL1.

And that's a lot of trapping and emulating. Jesus, that's a lot of trapping and emulating! (meme)

bugaevc,
@bugaevc@floss.social avatar

So in ARM v8.4, they introduced "Enhanced nested virtualization support", aka NV2, née NEVE.

This, you guessed it, adds a new NV2 bit to HCR_EL2. When NV2 is enabled, at EL1:

  • Access to registers that would control EL2 just gets transparently redirected to matching registers that control EL1. If you remember the opposite redirection from VHE/E2H earlier, it's likely that the nested hypervisor thought it was accessing EL1 registers in the first place, and now it really does again 🙂
bugaevc,
@bugaevc@floss.social avatar
  • Access to registers that would control EL1 is turned into memory loads/stores (indeed, no need to trap just to load/save the value). When the nested hypervisor tries to return to EL1, the outer hypervisor loads these previously saved values to the real EL1-controlling registers.

This does a lot less trapping, and is overall quite reasonable.

And that's basically it, thank you for coming to my TED talk. Fin.

bugaevc, to random
@bugaevc@floss.social avatar

Updated to F40 for a laugh. Now I'm unable to log in, since SDDM crashes on startup.

Qt upstream sounds pretty clueless about how distros work: "GCC 14 isn't officially supported by Qt yet. Are you able to build with GCC 11?"

The root cause is apparently a GCC regression https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114576

this is fine.jpg

bugaevc,
@bugaevc@floss.social avatar

@alatiera sure

yosh, to random
@yosh@toot.yosh.is avatar

I do legitimately hope that Linux eventually stops being the most relevant open source OS out there.

It’s takes from senior project leadership like: "We ship our own bespoke memory model, for performance reasons - we don't care if no ISA vendor supports it” and: “Instead of panicking when UB would trigger, we believe code should just keep going instead.”

I'm not a kernel domain expert; but I know that this isn't a sound basis to build a system on. And it makes me wonder about everything else.

bugaevc,
@bugaevc@floss.social avatar

@yosh not going to argue about the specific concerns ('cause I mostly agree, for one thing) — but what (existing, not hypothetical) kernel do you like better?

bugaevc, to random
@bugaevc@floss.social avatar

It's been years since I've read (in Gernot Heiser's blog, I think?) that nanokernels and hypervisors are basically one and the same, but I kind of brushed it off and never understood the point, until now.

Now that I'm thinking of how to bring hardware VM acceleration to Mach and reading about ARM v8.1 Virtualization Host Extensions, this suddenly makes so much sense.

uliwitness, to random
@uliwitness@chaos.social avatar

Idly wondering on the walk home:

What are the best Linux laptops these days? All hardware supported, great battery time (~10 hrs), 14” 1080p, great trackpad, USB-C Power delivery support?

And is there a machine like that with a gaming GPU that’ll run current games at normal or better (for dual-booting Windows, I know WINE/Proton exists, it’s not enough for me).

bugaevc,
@bugaevc@floss.social avatar

@uliwitness I hear good things about the Apple M line

bugaevc, to random
@bugaevc@floss.social avatar

I need 'git commit --llm' that would ask me a bunch of questions and then write a commit message for me based on my explanations.

Like, I fixed this complicated issue, and even though the patch itself is quite small, it takes 6 paragraphs of prose to explain what the issue even was and how it could happen.

bugaevc,
@bugaevc@floss.social avatar
bugaevc,
@bugaevc@floss.social avatar

@mpjgregoire on SMP, yes, this was a critical missing piece. Expect more SMP news soon :)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • tacticalgear
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • lostlight
  • All magazines