@mattst88@fosstodon.org
@mattst88@fosstodon.org avatar

mattst88

@mattst88@fosstodon.org

Gentoo developer, freedesktop.org contributor, software engineer at Google.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

mattst88, to random
@mattst88@fosstodon.org avatar

Another fun adventure, this time with a happy ending.

A unit test in libXmu failed on x86-32 (https://gitlab.freedesktop.org/xorg/lib/libxmu/-/issues/2). I looked for the typical things first like bad casts but didn't see anything wrong.

I noticed that the unit test program runs two subtests and the log indicates that it completed the first successfully before crashing, but when I ran it under gdb the back trace showed it in the first unit test when the segfault occurred. Very strange.

mattst88,
@mattst88@fosstodon.org avatar

Single stepping in , I ultimately came across a longjmp() call (apparently libXt handles errors this way?), and it was this call that triggered the failure.

Turns out each of the subtests checks some exceptional case and expects a function call to fail by longjmp()'ing, but only the first unit test actually prepared for the jump with a call to setjmp().

As a result, when the second subtest triggered its own longjmp() it jumped to the first subtest's function that had already completed!

mattst88, to random
@mattst88@fosstodon.org avatar

Weird debugging mystery. On an ARM system, I saw a few packages failing to build with a strange error:

> 1 {standard input}: Assembler messages:
> 1285 {standard input}:3629: Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0'

I started seeing this after updating the system, which also updated gcc. I suspected a gcc regression, so I filed https://bugs.gentoo.org/923154 in Gentoo's bugzilla.

I found that the previous version of gcc didn't have this problem. Should be able to bisect..

mattst88,
@mattst88@fosstodon.org avatar

I also tried and couldn't reproduce the failure in a 32-bit chroot on #Gentoo's #aarch64 development machine, so I was stuck doing all the debugging (and loooooong #gcc builds) on my very slow single-core 800MHz Solid Run #CuBox.

Diff'ing the assembly output between the working and non-working gcc versions I saw:

> - vmov.f64 d0, #6.:e+0
> + vmov.f64 d0, #7.0e+0

Naturally, binutils' assembler fails to recognize "6.:e+0" as a floating-point constant. Where is the ":" coming from?

mattst88,
@mattst88@fosstodon.org avatar

My best guess is that in gcc's real_to_decimal_for_mode function (https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/real.cc;h=2c682b1809d46c6957e1ce83ec808377c27f8b65;hb=HEAD#l1812), something goes wrong.

The code does the old trick of adding the character '0' an integer digit to produce the ASCII character representing the digit. E.g. '0' + 6 == '6'. '0' is ASCII 48, add 6 gives ASCII 54 which is the character '6'.

But something is wrong, and what is supposed to be a single digit is 10, and '0' + 10 == 58 which is ':'.

But what is going wrong to cause this?

mattst88,
@mattst88@fosstodon.org avatar

I don't know!

My best guess is that some floating-point imprecision is essentially resulting in some calculation producing 6 and 10/10 instead of 7.0.

I've tried a ton of things to narrow this down. I tried rebuilding gmp, mpfr, and mpc which gcc uses for multiprecision arithmetic. I've tried different versions of gcc — and I've tried bisecting gcc. None of the gccs built for bisecting reproduce the issue. I tried testing gcc's stage1 build (--disable-bootstrap) as well as stages 2 and 3.

mattst88,
@mattst88@fosstodon.org avatar

I took a binary package of gcc built on my CuBox that reproduces the issue there, and I installed it in the 32-bit chroot on Gentoo's aarch64 development system. The issue doesn't reproduce.

I took a binary package of gcc built in the 32-bit chroot on Gentoo's aarch64 development system that doesn't reproduce the issue there, and I installed it on the CuBox. The issue does reproduce.

So I don't know if there's just something wrong with the Marvell Armada 510 SoC in the CuBox or what.

mattst88,
@mattst88@fosstodon.org avatar

Perhaps the most ridiculous thing is the simplicity of the test case — single line of C:

> mattst88@cubox ~ $ cat wtf.c
> double wtf(void) { return 7.0; }
> mattst88@cubox ~ $ gcc -O2 -c wtf.c
> /tmp/ccYogZ0U.s: Assembler messages:
> /tmp/ccYogZ0U.s:24: Error: garbage following instruction -- `vmov.f64 d0,#6.:e+0'

So I think barring some breakthrough, the CuBox will be retired. It is painfully slow as it is.

mattst88, to random
@mattst88@fosstodon.org avatar

I spend most of my week debugging code. So when I picked up my #tax documents from a preparation service and something seemed wrong, I started debugging.

Turns out they made a copy-and-paste error that meant a $2500 tax bill rather than a $1000 refund. 🤦‍♂️

I did my own #taxes for years until recently. I just want to pay someone to save myself the weekend of researching and agonizing. But I also need to be confident enough in that I don't need to double-check their work for crap like this. 🤡

mattst88, to random
@mattst88@fosstodon.org avatar

It's funny seeing someone's "About" on LinkedIn describing themselves as possessing "thought leadership, and excellent communication skills" that you know is actually a massively toxic asshole.

mattst88, to random
@mattst88@fosstodon.org avatar

I'm trying to fix a patch to allow #pixman's #ARM #NEON #assembly code to build with clang. They perform a lot of mechanical changes to switch to the "unified" ARM assembly syntax (.syntax unified), supported by both #gcc and #clang.

With clang the code builds but fails 3 of the tests in the test suite with what appear to be unaligned accesses. With gcc, the test suite passes before and after the patches.

I've muddled through as much debugging as I can. Any ideas? https://gitlab.freedesktop.org/pixman/pixman/-/merge_requests/78#note_2078065

jani, to random
@jani@fosstodon.org avatar

I usually have lots of git branches with various iterations of whatever I'm working on. And I lose track what each branch contains. git range-diff is great, but a bit tedious.

Here's a helper alias to "range compare" two branches. It finds their common ancestor, and range diffs them.

Usage: git range-compare rev1 [rev2]

~/.gitconfig:
[alias]
range-compare = "!f() { rev1=${1:?no branch to compare to}; rev2=${2:-HEAD}; git range-diff $(git merge-base $rev1 $rev2) $rev1 $rev2; }; f"

#git

mattst88,
@mattst88@fosstodon.org avatar

@jani In the Mesa docs we recommend a 'git fixes' alias that could be repurposed for this:

> fixes = show -s --pretty='format:Fixes: %h ("%s")'

This seems to give a reasonably short SHA1 without the sed.

mattst88, to GNOME
@mattst88@fosstodon.org avatar

After personally doing 1350 version bumps over 7 major GNOME releases in three years (3.38, 40, 41, 42, 43, 44, 45) I've decided to step down from the #GNOME team in #Gentoo.

I've had a very good contributor helping for the last year to get new versions into Gentoo quickly, and while I'm not happy about leaving him to fend for himself, it's not sustainable for me to continue maintaining such a large set of packages without help from other Gentoo developers.

mattst88,
@mattst88@fosstodon.org avatar

@ebassi And thank you for all your help getting patches into GNOME! :)

mattst88, to gentoo
@mattst88@fosstodon.org avatar

I like #Gentoo and I enjoy being a developer and package maintainer. The distro offers incredible flexibility to configure your system in any way you like.

But I really wish that didn't attract complete nutters who want to run Linux with mostly modern software but e.g. don't want udev on their systems.

(Similar situations arise with dbus, rust, etc., to say nothing of systemd)

mattst88,
@mattst88@fosstodon.org avatar

Fine, you want to run #Gentoo built with clang and link-time optimization; linked with mold; using musl libc, libressl, slibtool; maximum hardened CFLAGS; SELinux; all on an aarch64 system runnning in big-endian mode. But please, just use udev.

mattst88,
@mattst88@fosstodon.org avatar

Recent hilarity (from the same user, no less!)

  • Package failed to build because QtCore/qsystemdetection.h was missing. Turns out the user didn't want anything related to systemd on his system, so he was removing anything that matched systemd.
  • User wanted to be able to build Xorg with GLX support but not DRI, because it would save 0.4 MiB (12.4 MiB on-disk vs 12.4 MiB), and somehow this configuration was supposed to play Steam games.
mattst88, to random
@mattst88@fosstodon.org avatar

Any day you have to git bisect skip is not a good day. #git

mattst88, to gentoo
@mattst88@fosstodon.org avatar

Another day, another bizarre software discovery.

Apparently #Gentoo's sys-apps/sandbox (which ensures ebuilds don't make a mess outside of their build "sandbox") had a huge performance regression which caused webkit-gtk build times to go from 9 minutes to 1 hour.

After collecting a ton of data, applying patches, reverting patches, etc, I filed https://bugs.gentoo.org/910273 and it seems we have a fix.

But I don't know how it's fixing things!

mattst88,
@mattst88@fosstodon.org avatar

The proposed patch removes the use of the faccessat() function and instead relies on fstatat64().

The system I'm testing on is a 64-core/128-thread beast, and I found that building with -j32 is actually significantly faster than with -j128 (39 minutes vs 1 hour).

So the faccessat() function must be causing some sort of serialization that essentially causes a denial of service with that many jobs?

Any guesses what is going on here?

#linux, #glibc, #gentoo

mattst88, to random
@mattst88@fosstodon.org avatar

I've been on the #Gentoo Council for the last three years. I've been nominated to stand for election this year, and I've been considering declining in order to focus on stuff that's more fun.

But yesterday, the Council members got an email from Open Collective informing us that Gentoo's application to join has been rejected because the Gentoo Foundation's Trustees never replied and provided answers to questions from Open Collective.

See for https://marc.info/?l=gentoo-project&m=168727176218626&w=2 details.

mattst88, to random
@mattst88@fosstodon.org avatar

You know what's a "code smell" I've noticed?

A dirty git status after a build.

Even worse if the build modifies (or deletes!) files tracked by git.

Yes, I saw this today. And man, oh man, is the whole project smelly.

mattst88,
@mattst88@fosstodon.org avatar

You know how some people that know Java don't put it on their résumé because they don't want to work in Java?

For me, that's #autotools, #autoconf, #automake

PSA: please switch your projects to #Meson

mattst88, to random
@mattst88@fosstodon.org avatar

I'm the only person maintaining support for DEC Alpha. I was testing GTK-4 on Alpha for https://bugs.gentoo.org/838709 and noticed that its test suite generated a lot of unaligned accesses (where, e.g. a load of 4-bytes from an address that isn't 4-byte aligned). These are slow on Alpha because the kernel has to trap and emulate the memory operation.

For debugging, you can use the prctl utility (search for "prctl" on https://wiki.gentoo.org/wiki/Project:Alpha/Porting_guide for instructions).

mattst88,
@mattst88@fosstodon.org avatar

So in order to fix some unaligned accesses I noticed in #GTK on DEC #Alpha, I

  • used a SPARC64 system
  • discovered that gdb is broken on SPARC64
  • bisected and reported the gdb regression, which happened nearly 2 years ago
  • found a legitimate heisenbug in Mesa and reported it after fully understanding it
  • reviewed the fix and unit test, confirmed that it fixes the problem and the unit tests now pass
  • debugged a strict-aliasing violation in Mesa and made a merge request fixing a bug from 2010
mattst88,
@mattst88@fosstodon.org avatar

I just reran the GTK test suite on Alpha with the patches applied, and found out that there are more. Many, many more. /o\

The saga continues...

Edit: I was wrong! I failed to apply the second patch. With it applied, the test suite passes and only a single test has unaligned accesses (which is the one I was aware of already)

Ok: 722
Expected Fail: 0
Fail: 0
Unexpected Pass: 0
Skipped: 1
Timeout: 0

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • thenastyranch
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • ngwrru68w68
  • provamag3
  • magazineikmin
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • JUstTest
  • All magazines