azonenberg

@azonenberg@ioc.exchange

Security and open source at the hardware/software interface. Embedded sec @ IOActive. Lead dev of ngscopeclient/libscopehal. GHz probe designer. Open source networking hardware. "So others may live"

Toots searchable on tootfinder.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

azonenberg, 8 days ago to random

Not sure if this is a gcc bug or some weird corner of UB or what...

But I have a packed struct containing a uint32 as the first field. I'm running on ARMv7-M so 32-bit unaligned loads are allowed (but not 64-bit).

This struct is being read directly via casting from a network RX buffer that is likely not aligned to any particular byte boundary. It's a) packed and b) has 32-bit fields in it.

So silly me assumed that gcc would generate either bytewise reads (assuming no alignment at all) or a ldr instruction (accepting that 32-bit unaligned loads are OK).

But for some reason at -O3 it generates a 64-bit read with ldrd, which promptly hard faults. I have no idea why it's doing that given that I was just __builtin_bswap32'ing a single 32-bit field.

Was able to work around the issue with memcpy, but seriously WTF? If I'm using a packed struct I'm explicitly telling the compiler not to make any assumptions about alignment because I'm directly serializing the data from somewhere. Where did it magically get the idea that my packed 32-bit field had 64-bit alignment?

reply

expand (52)

collapse (52)

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark This isn't hardware it's just a SFTP "open" packet.

My real question isn't so much the packed struct as why it's generating a 64-bit load for accessing a 32-bit scalar? It should be assuming 32-bit alignment at most (and since ARMv7-m allows 32-bit loads at any alignment, this should be fine).

Why would it assume that my 32-bit variable is 64-bit aligned?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark Actually, looking more closely at the offending code it wasn't a packed struct pointer at all, it was a uint32_t* created by pointer arithmetic on the buffer (due to variable length fields).

The same thing stands though. If i give you a random uint32_t* on ARMv7-m you should not assume it's 64-bit aligned.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark https://github.com/azonenberg/staticnet/blob/master/sftp/SFTPOpenPacket.h?ts=4#L88

The __builtin_bswap32 call in line 92 is crashing (only with -O3) in a ldrd instruction. I have no idea where it got the idea issuing a 64-bit load was necessary or OK.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark The "fixed" (not crashy) version is

char* tmp = GetPathStart() + GetPathLength() + sizeof(uint32_t);
uint32_t n;
memcpy(&n, tmp, sizeof(n));
return __builtin_bswap32(n);

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark (note that the overall SFTPOpenPacket object may have any alignment since it's coming out of a network rx buffer, not allocated on the stack or anything).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark (this is using Debian bookworm packaged arm-none-eabi-g++ (15:12.2.rel1-1) 12.2.1 20221205)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@pkhuong @whitequark I mean, this struct was created from a larger buffer (ultimately it's an EthernetFrame which is a fixed MTU-sized static buffer, then I have a struct at the beginning for the fixed-sized fields then offset off the end of that).

Not sure how else you'd handle this?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark No explicit -march but -mcpu=cortex-m7

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark Full CFLAGS (other than macro defines and include search paths) are -g -O3 --std=c++17 -Wall -Wextra -fno-rtti -fno-exceptions --specs nano.specs -mcpu=cortex-m7

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark Interesting. There's probably some additional context required to trigger the bug? I don't have time to really troubleshoot this at the moment since I have a workaround.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark This was the actual offending code as generated

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark I didn't fully trace out what register had what values, this was optimized code so a pain to debug (a lot of the values I wanted to dump to troubleshoot were optimized out).

Got a workaround and went on with my development, now chasing a different bug that looks to be an actual application logic error of some sort.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark If it helps the actual SFTPOpenPacket comes from this

auto payload = reinterpret_cast<SFTPOpenPacket*>(pack->Payload());
payload->ByteSwap();
OnRxOpen(id, state, socket, payload);

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark where "pack" is a SFTPPacket* which itself comes from

while(IsPacketReady(state))
{
auto pack = reinterpret_cast<SFTPPacket*>(state->m_rxBuffer.Rewind());
pack->ByteSwap();
OnRxPacket(id, state, socket, pack);
state->m_rxBuffer.Pop(pack->m_length + sizeof(uint32_t));
}

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark There should be no expectation of 64-bit alignment on any of this. Ultimately "pack" in this outer code should be on at least a 32-bit boundary since state->m_rxBuffer was rewound (this is a circular FIFO but "rewind" rotates the buffer so the contents are contiguous and start at address 0 of the buffer) and that buffer is a large array so I would expect it to be aligned to some extent.

But once I start parsing out variable-length fields from within it, all bets are off.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@pkhuong @whitequark How else can I make a member function that does this?

I can't store a pointer to the outer buffer in the object because I can't have any variables that aren't part of the packet itself.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@steve @whitequark So, on ARMv7-M it's legal to load a uint32_t (ldr) from any alignment so even if it is unaligned, it shouldn't matter.

What's not legal is to load an unaligned uint64_t (ldrd).

The bug is that gcc took a type that should only have guaranteed 32-bit alignment and emitted an instruction that required 64-bit alignment.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@steve @whitequark Doesn't matter. The bug isn't that it took an odd-aligned uint32 and treated it as 32-bit aligned.

The bug is that it took a 32-bit aligned value and treated it as 64-bit aligned. Nothing in the code gave gcc permission to assume this uint32_t* was on a 64-bit boundary rather than an odd 32-bit index.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@steve @whitequark More specifically, given uint32_t* foo = 0xdeadbee4, ldr will work and ldrd will segfault.

There was no reason to issue a 64-bit load in the first place, this is a 32-bit variable.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark @steve ... wait, what? Ok now I'm confused.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

whitequark, 8 days ago to random

i'm sorry... why does my laptop include a schematic sheet for smokeless powder?

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

@whitequark It was the cheapest implementation for ATA Secure Erase they could come up with.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 1 month ago to random

New thread on my big ongoing embedded project since the other one was getting too big.

To recap, this is a pilot project for a bunch of my future open hardware T&M and networking projects, validating a common platform that a lot of the future stuff is going to run on.

The primary problem it's trying to address is that I have a lot of instrumentation with trigger in/out ports, sometimes at different voltage levels, and I don't always have the same instrument sourcing the trigger every time.

So rather than moving around cables all the time and adding splitters, attenuators, amplifiers, etc. to the trigger signals I decided to make a dedicated device using an old XC7K70T-2FBG484 I had lying around.

Of course, as with any project, there was feature creep.

I'm standardizing on +48V DC for powering all of my future projects as it's high enough to move a lot of power but low enough to be mostly safe to work around live. So I needed to design and validate an intermediate bus converter to bring the 48 down to something like 12 for the rest of the system to use.

The FPGA has four 10G transceiver pairs on it. I used one for 10GbE (not that I need the bandwidth, but I was low on RJ45 ports on this bench and had some free SFP drops) and the rest are hooked up to front panel SMA ports (awaiting cables to go from PCB to panel) to generate PRBSes for instrument deskew.

Since I'm pinning out the transceivers and am planning to build a BERT eventually, I added BERT functionality to the firmware as well (still need to finish a few things but it's mostly usable now).

And since I have transceivers and access to all of the scope triggers, it would be dumb not to build a CDR trigger mode as well. That's in progress.

reply

expand (143)

collapse (143)

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

Next step is to be able to DFU the FPGA.

Coming along nicely, but not done yet. I'm parsing the entire .bit structure on the fly in order to enable editing some config settings for multi-boot, although for now the data is being passed through to the write buffer unchanged (then discarded since I've implemented flash erasing but not writes).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

I'm now thoroughly confused.

I can now OTA flash the bitstream, but the image is subtly corrupted.

I tried three times with different SPI clock frequencies, and each time the exact same, byte for byte, incorrect bitstream is burned to the flash. So it seems pretty clear this is an application logic bug and not SI (which I would expect to have a more nondeterministic effect).

The bug manifests itself as two corrupted bytes, repeating on 1024 byte boundaries for a while, then something else.

For example, starting at 0x10b60, 10f60, 0x11360, etc. I have 0x60 03 overwriting the intended bitstream values. This persists up to 0x17f60.

Then at 18b44, 18f44, etc. I have 0x44 03 overwriting the intended values.

And so on. Weird 2-byte corruptions repeating at 1 kB boundaries. The flash write blocks are 512 bytes and the SFTP packets are mostly 1024 bytes, so it might have something to do with that... This is gonna be fun to debug.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

azonenberg, 8 days ago

Hmm, seems like corruption in the RX circular FIFO. Making the buffer larger fixed it, but my error detection should have logged an overflow. Great, now I have something else to poke at...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...