@azonenberg@ioc.exchange
@azonenberg@ioc.exchange avatar

azonenberg

@azonenberg@ioc.exchange

Security and open source at the hardware/software interface. Embedded sec @ IOActive. Lead dev of ngscopeclient/libscopehal. GHz probe designer. Open source networking hardware. "So others may live"

Toots searchable on tootfinder.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

azonenberg, to random
@azonenberg@ioc.exchange avatar

Not sure if this is a gcc bug or some weird corner of UB or what...

But I have a packed struct containing a uint32 as the first field. I'm running on ARMv7-M so 32-bit unaligned loads are allowed (but not 64-bit).

This struct is being read directly via casting from a network RX buffer that is likely not aligned to any particular byte boundary. It's a) packed and b) has 32-bit fields in it.

So silly me assumed that gcc would generate either bytewise reads (assuming no alignment at all) or a ldr instruction (accepting that 32-bit unaligned loads are OK).

But for some reason at -O3 it generates a 64-bit read with ldrd, which promptly hard faults. I have no idea why it's doing that given that I was just __builtin_bswap32'ing a single 32-bit field.

Was able to work around the issue with memcpy, but seriously WTF? If I'm using a packed struct I'm explicitly telling the compiler not to make any assumptions about alignment because I'm directly serializing the data from somewhere. Where did it magically get the idea that my packed 32-bit field had 64-bit alignment?

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark This isn't hardware it's just a SFTP "open" packet.

My real question isn't so much the packed struct as why it's generating a 64-bit load for accessing a 32-bit scalar? It should be assuming 32-bit alignment at most (and since ARMv7-m allows 32-bit loads at any alignment, this should be fine).

Why would it assume that my 32-bit variable is 64-bit aligned?

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark Actually, looking more closely at the offending code it wasn't a packed struct pointer at all, it was a uint32_t* created by pointer arithmetic on the buffer (due to variable length fields).

The same thing stands though. If i give you a random uint32_t* on ARMv7-m you should not assume it's 64-bit aligned.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark https://github.com/azonenberg/staticnet/blob/master/sftp/SFTPOpenPacket.h?ts=4#L88

The __builtin_bswap32 call in line 92 is crashing (only with -O3) in a ldrd instruction. I have no idea where it got the idea issuing a 64-bit load was necessary or OK.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark The "fixed" (not crashy) version is

char* tmp = GetPathStart() + GetPathLength() + sizeof(uint32_t);
uint32_t n;
memcpy(&n, tmp, sizeof(n));
return __builtin_bswap32(n);

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark (note that the overall SFTPOpenPacket object may have any alignment since it's coming out of a network rx buffer, not allocated on the stack or anything).

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark (this is using Debian bookworm packaged arm-none-eabi-g++ (15:12.2.rel1-1) 12.2.1 20221205)

azonenberg,
@azonenberg@ioc.exchange avatar

@pkhuong @whitequark I mean, this struct was created from a larger buffer (ultimately it's an EthernetFrame which is a fixed MTU-sized static buffer, then I have a struct at the beginning for the fixed-sized fields then offset off the end of that).

Not sure how else you'd handle this?

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark No explicit -march but -mcpu=cortex-m7

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark Full CFLAGS (other than macro defines and include search paths) are -g -O3 --std=c++17 -Wall -Wextra -fno-rtti -fno-exceptions --specs nano.specs -mcpu=cortex-m7

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark Interesting. There's probably some additional context required to trigger the bug? I don't have time to really troubleshoot this at the moment since I have a workaround.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark This was the actual offending code as generated

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark I didn't fully trace out what register had what values, this was optimized code so a pain to debug (a lot of the values I wanted to dump to troubleshoot were optimized out).

Got a workaround and went on with my development, now chasing a different bug that looks to be an actual application logic error of some sort.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark If it helps the actual SFTPOpenPacket comes from this

auto payload = reinterpret_cast<SFTPOpenPacket*>(pack->Payload());
payload->ByteSwap();
OnRxOpen(id, state, socket, payload);

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark where "pack" is a SFTPPacket* which itself comes from

while(IsPacketReady(state))
{
auto pack = reinterpret_cast<SFTPPacket*>(state->m_rxBuffer.Rewind());
pack->ByteSwap();
OnRxPacket(id, state, socket, pack);
state->m_rxBuffer.Pop(pack->m_length + sizeof(uint32_t));
}

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark There should be no expectation of 64-bit alignment on any of this. Ultimately "pack" in this outer code should be on at least a 32-bit boundary since state->m_rxBuffer was rewound (this is a circular FIFO but "rewind" rotates the buffer so the contents are contiguous and start at address 0 of the buffer) and that buffer is a large array so I would expect it to be aligned to some extent.

But once I start parsing out variable-length fields from within it, all bets are off.

azonenberg,
@azonenberg@ioc.exchange avatar

@pkhuong @whitequark How else can I make a member function that does this?

I can't store a pointer to the outer buffer in the object because I can't have any variables that aren't part of the packet itself.

azonenberg,
@azonenberg@ioc.exchange avatar

@steve @whitequark So, on ARMv7-M it's legal to load a uint32_t (ldr) from any alignment so even if it is unaligned, it shouldn't matter.

What's not legal is to load an unaligned uint64_t (ldrd).

The bug is that gcc took a type that should only have guaranteed 32-bit alignment and emitted an instruction that required 64-bit alignment.

azonenberg,
@azonenberg@ioc.exchange avatar

@steve @whitequark Doesn't matter. The bug isn't that it took an odd-aligned uint32 and treated it as 32-bit aligned.

The bug is that it took a 32-bit aligned value and treated it as 64-bit aligned. Nothing in the code gave gcc permission to assume this uint32_t* was on a 64-bit boundary rather than an odd 32-bit index.

azonenberg,
@azonenberg@ioc.exchange avatar

@steve @whitequark More specifically, given uint32_t* foo = 0xdeadbee4, ldr will work and ldrd will segfault.

There was no reason to issue a 64-bit load in the first place, this is a 32-bit variable.

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark @steve ... wait, what? Ok now I'm confused.

whitequark, to random
@whitequark@mastodon.social avatar

i'm sorry... why does my laptop include a schematic sheet for smokeless powder?

azonenberg,
@azonenberg@ioc.exchange avatar

@whitequark It was the cheapest implementation for ATA Secure Erase they could come up with.

azonenberg, to random
@azonenberg@ioc.exchange avatar

New thread on my big ongoing embedded project since the other one was getting too big.

To recap, this is a pilot project for a bunch of my future open hardware T&M and networking projects, validating a common platform that a lot of the future stuff is going to run on.

The primary problem it's trying to address is that I have a lot of instrumentation with trigger in/out ports, sometimes at different voltage levels, and I don't always have the same instrument sourcing the trigger every time.

So rather than moving around cables all the time and adding splitters, attenuators, amplifiers, etc. to the trigger signals I decided to make a dedicated device using an old XC7K70T-2FBG484 I had lying around.

Of course, as with any project, there was feature creep.

I'm standardizing on +48V DC for powering all of my future projects as it's high enough to move a lot of power but low enough to be mostly safe to work around live. So I needed to design and validate an intermediate bus converter to bring the 48 down to something like 12 for the rest of the system to use.

The FPGA has four 10G transceiver pairs on it. I used one for 10GbE (not that I need the bandwidth, but I was low on RJ45 ports on this bench and had some free SFP drops) and the rest are hooked up to front panel SMA ports (awaiting cables to go from PCB to panel) to generate PRBSes for instrument deskew.

Since I'm pinning out the transceivers and am planning to build a BERT eventually, I added BERT functionality to the firmware as well (still need to finish a few things but it's mostly usable now).

And since I have transceivers and access to all of the scope triggers, it would be dumb not to build a CDR trigger mode as well. That's in progress.

azonenberg,
@azonenberg@ioc.exchange avatar

Next step is to be able to DFU the FPGA.

Coming along nicely, but not done yet. I'm parsing the entire .bit structure on the fly in order to enable editing some config settings for multi-boot, although for now the data is being passed through to the write buffer unchanged (then discarded since I've implemented flash erasing but not writes).

azonenberg,
@azonenberg@ioc.exchange avatar

I'm now thoroughly confused.

I can now OTA flash the bitstream, but the image is subtly corrupted.

I tried three times with different SPI clock frequencies, and each time the exact same, byte for byte, incorrect bitstream is burned to the flash. So it seems pretty clear this is an application logic bug and not SI (which I would expect to have a more nondeterministic effect).

The bug manifests itself as two corrupted bytes, repeating on 1024 byte boundaries for a while, then something else.

For example, starting at 0x10b60, 10f60, 0x11360, etc. I have 0x60 03 overwriting the intended bitstream values. This persists up to 0x17f60.

Then at 18b44, 18f44, etc. I have 0x44 03 overwriting the intended values.

And so on. Weird 2-byte corruptions repeating at 1 kB boundaries. The flash write blocks are 512 bytes and the SFTP packets are mostly 1024 bytes, so it might have something to do with that... This is gonna be fun to debug.

azonenberg,
@azonenberg@ioc.exchange avatar

Hmm, seems like corruption in the RX circular FIFO. Making the buffer larger fixed it, but my error detection should have logged an overflow. Great, now I have something else to poke at...

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • thenastyranch
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • ngwrru68w68
  • provamag3
  • magazineikmin
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • JUstTest
  • All magazines