What this means is that you only allocate registers for worst-case paths if they actually get executed. And even if those paths do run, regs can get "tetris'ed" together with the other executing waves which may be executing a different part of the program (or a different program), which allows for higher overall occupancy. If that on-chip memory runs out it spills to L2, but there's some kind of hardware magic to monitor for that and adjust occupancy to try to balance spilling vs. occupancy.