ogrisel,
@ogrisel@sigmoid.social avatar

I have been thinking a bit about how to detect supply chain attacks against popular open source projects such as scikit-learn.

If you have practical experience with https://reproducible-builds.org/ in particular in the #Python / #PyData ecosystem, I would be curious about any feedback to the plan I suggest for scikit-learn in the following issue.

Feel free to reply on mastodon first, if you have questions.

https://github.com/scikit-learn/scikit-learn/issues/28151

vstinner,
@vstinner@mamot.fr avatar
sethmlarson,
@sethmlarson@fosstodon.org avatar

@vstinner @ogrisel Thanks for the tag Victor :)

Hey Olivier, happy to help!

In general that issue details a lot of the current sources of non-determinism in Python package builds, so you're on the right track with your thinking. Build backends, compilers, tools respecting SOURCE_DATE_EPOCH and being reproducible themselves.

One tool that I highly recommend is diffoscope (https://pypi.org/project/diffoscope/), this will show you differences across two seemingly identical artifacts.

sethmlarson,
@sethmlarson@fosstodon.org avatar

@vstinner @ogrisel Build backends in particular is an area that I know needs work, potentially from a standards POV (such as recording the build backend name, version, checksum, and dependencies in the wheel file)

Without this information it's impossible to reproduce a given wheel from its source code/sdist and it's not desirable to "pin" a build backend version in pyproject.toml (although I haven't thought enough about this topic yet)

ogrisel,
@ogrisel@sigmoid.social avatar

@sethmlarson @vstinner I completely agree with all you said. Do you plan to focus first on helping make official cpython releases themselves automatically reproducible? Or do you plan to focus on improving wheel building tools to make pypi hosted artifacts reproducible?

Interesting work w.r.t. the SBOM of cpython. It would be interesting to have cibuildwheel able to dump an SBOM file, and later rebuild from one while checking the sha256 values of the dependencies.

sethmlarson, (edited )
@sethmlarson@fosstodon.org avatar

@ogrisel @vstinner Right now I'm focusing on CPython and next I'll likely focus on Python packaging ecosystem best practices.

It is on my roadmap to improve Python package tooling and standards to make reproducibility possible. I would like to get to this work in 2024, unfortunately there's only one of me right now so I can only say with certainty that I won't be able to start on that in early 2024. But if someone were to pick up this work earlier I would happily review and assist!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • python
  • DreamBathrooms
  • mdbf
  • InstantRegret
  • Durango
  • Youngstown
  • rosin
  • slotface
  • thenastyranch
  • osvaldo12
  • ngwrru68w68
  • kavyap
  • cisconetworking
  • khanakhh
  • magazineikmin
  • anitta
  • cubers
  • vwfavf
  • modclub
  • everett
  • ethstaker
  • normalnudes
  • tacticalgear
  • tester
  • provamag3
  • GTA5RPClips
  • Leos
  • megavids
  • JUstTest
  • All magazines