Fedora Reproducible Builds

This documentation tracks the effort to implement reproducible builds in Fedora.

The goal is to have “reproducible builds” for rpms in Fedora and later the rest of the ecosystem. This would allow our users to be able to independently verify that the rpms have not been tampered with (either maliciously or by unreliable hardware): someone can do an independent rebuild of a package and confirm that they get identical binaries when building with the same versions of the compiler and other tools.

Background

The concept of "reproducible builds" was originally defined by the reproducible-builds.org initiative in the context of Debian:

A build is reproducible if given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts.
— reproducible-builds.org

For Debian, reproducible builds are crucial. Debian allows maintainers to generate source packages locally, possibly without any version control, and even to upload locally-built packages for distribution to users. For this reason, trust in the contents of both source and binary packages is low.

In Fedora, all packages that are distributed to users are built in the centralized, strongly controlled infrastructure. All source rpms are built from “dist-git”: a git repository which contains the build “recipe” and a cryptographic hash of package sources, so it is relatively easy to verify what changed between package versions, what “inputs” went into a particular source package, and in what environment the binary packages were built.

Because of this strong control over the build process, reproducible builds haven’t historically been a priority in Fedora.

Benefits

Let’s image what would happen if the hardware was broken: overheated memory on one builder does bit flips, corrupting output rpms in a subtle way. If we are able to redo any build, it would be fairly easy to verify whether this is the case. If we had a process of doing “shadow” reproducible builds for all packages, we could even detect such cases before any bug reports from users. Similarly, we could detect if a builder machine was compromised in some sort of a supply-chain attack to inject rogue code into the rpms. Reproducibility allows independent verification that the dist-git sources actually correspond to the binaries that are delivered. With such checks, any kind of supply chain attack would be very hard to do undetected.

Reproducibility is also interesting because it aids debugability. Essentially, when the builds are stable, any unexpected change in the build outputs is much easier to diagnose. We have already found and submitted a bunch of obvious fixes that would not have been found otherwise. Also, when builds are stable, when working on the tools, it is easy to do a rebuild with the patched tools and observe the diff. If the build is "unstable", i.e. there are various other unrelated changes, interesting differences often drown in noise.

Caveats

In the Fedora ecosystem, we cannot achieve reproducibility by the reproducible-builds.org definition. A fully identical result cannot be achieved, because rpm packages are distributed after signing, with the signature is embedded in the rpm (while Debian uses detached signatures). A rebuild of a package (as distributed to users) will always differ at least by this missing signature. In principle, a rebuilder could build an identical rpm and then transplant the signature from the original rpm onto the rebuilt rpm, but that wouldn’t meet the requirements either, as the premise is that the rebuild has to be reproduced without access to the build artifacts. The use of detached signatures in RPM has also been proposed and rejected before.

Moreover, rpm builds inject some information about the build time and place into the outputs: BUILDTIME and BUILDHOST in the rpm header. While we could implement overrides in rpm, this information is useful and in practice it might not be desirable to lose it.

While there is ongoing discussion around this, we are leading towards an amended definition to better meet Fedora’s requirements:

A build is reproducible if given the same source code, build environment and build instructions, and metadata from the build artifacts, any party can recreate copies of the artifacts that are identical except for the signatures and parts of metadata.

We believe would still be useful to users, because it would allow them to verify that the build that was done in the official build system is trustworthy, by doing a comparison that ignores the short list of fields which are known to vary.

Status

This work was kicked off from an initial discussion at the RPM developer’s meetup during DevConf.CZ 2023. That led to the organization of a hackfest during Flock 2023, where we formalized goals, defined a general approach and started documenting known issues on the Pagure tracker. A recap of this event was published on Discourse and provided the starting point for this documentation. The project itself was formally announced on the Devel list in March 2024.

  翻译: