Removing PGP from PyPI (2023)

blog.pypi.org

72 points by harporoeder 9 months ago

This is slightly old news. For those curious, PGP support on the modern PyPI (i.e. the new codebase that began to be used in 2017-18) was always vestigial, and this change merely polished off a component that was, empirically[1], doing very little to improve the security of the packaging ecosystem.

Since then, PyPI has been working to adopt PEP 740[2], which both enforces a more modern cryptographic suite and signature scheme (built on Sigstore, although the design is adaptable) and is bootstrapped on PyPI's support for Trusted Publishing[3], meaning that it doesn't have the fundamental "identity" problem that PyPI-hosted PGP signatures have.

The hard next step from there is putting verification in client hands, which is the #1 thing that actually makes any signature scheme actually useful.

[1]: https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI...

[2]: https://peps.python.org/pep-0740/

[3]: https://docs.pypi.org/trusted-publishers/

westurner - 9 months ago

It's good that PyPI signs whatever is uploaded to PyPI using PyPI's key now.
GPG ASC support on PyPI was nearly as useful as uploading signatures to sigstore.
1. Is it yet possible to - with pypa/twine - sign a package uploaded to PyPI, using a key that users somehow know to trust as a release key for that package?
2. Does pip check software publisher keys at package install time? Which keys does pip trust to sign which package?
3. Is there any way to specify which keys to trust for a given package in a requirements.txt file?
4. Is there any way to specify which keys to trust for a version of a given package with different bdist releases, with Pipfile.lock, or pixi or uv?
People probably didn't GPG sign packages on PyPI because it wasn't easy or required to sign a package using a registered key/DID in order to upload.
Anyone can upload a signature for any artifact to sigstore. Sigstore is a centralized cryptographic signature database for any file.
Why should package installers trust that a software artifact publisher key [on sigstore or the GPG keyserver] is a release key?
gpg --recv-key downloads a public key for a given key fingerprint over HKP (HTTPS with the same CA cert bundle as everything else).
GPG keys can be wrapped as W3C DIDs FWIU.
W3C DIDs can optionally be centrally generated (like LetsEncrypt with ACME protocol).
W3C DIDs can optionally be centrally registered.
GPG or not, each software artifact publisher key must be retrieved over a different channel than the packages.
If PYPI acts as the (package,release_signing_key) directory and/or the keyserver, is that any better than hosting .asc signatures next to the downloads?
GPG signatures and wheel signatures were and are still better than just checksums.
Why should we trust that a given key is a release signing key for that package?
Why should we trust that a release signing key used at the end of a [SLSA] CI build hasn't been compromised?
How do clients grant and revoke their trust of a package release signing key with this system?
... With GPG or [GPG] W3C DIDs or whichever key algo and signed directory service.
- woodruffw - 9 months ago
  
  > It's good that PyPI signs whatever is uploaded to PyPI using PyPI's key now.
  You might have misunderstood -- PyPI doesn't sign anything with PEP 740. It accepts attestations during upload, which are equivalent to bare PGP signatures. The big difference between the old PGP signature support and PEP 740 is that PyPI actually verifies the attestations on uploads, meaning that everything that gets stored and re-served by PyPI goes through a "can this actually be verified" sanity check first.
  I'll try to answer the others piecewise:
  1. Yes. You can use twine's `--attestations` flag to upload any attestations associated with distributions. To actually generate those attestations you'll need to use GitHub Actions or another OIDC provider currently supported as a Trusted Publisher on PyPI; the shortcut for doing that is to enable `attestations: true` while uploading with `gh-action-pypi-publish`. That's the happy path that we expect most users to take.
  2. Not yet; the challenge there is primarily technical (`pip` can only vendor pure Python things, and most of PyCA cryptography has native dependencies). We're working on different workarounds for this; once they're ready `pip` will know which identity - not key - to trust based on each project's Trusted Publisher configuration.
  3. Not yet, but this is needed to make downstream verification in a TOFU setting tractable. The current plan is to use the PEP 751 lockfile format for this, once it's finished.
  4. That would be up to each of those tools to implement. They can follow PEP 740 to do it if they'd like.
  I don't really know how to respond to the rest of the comment, sorry -- I find it a little hard to parse the connections you're drawing between PGP, DIDs, etc. The bottom line for PEP 740 is that we've intentionally constrained the initial valid signing identities to ones that can be substantiated via Trusted Publishing, since those can also be turned into verifiable Sigstore identities.
  - westurner - 9 months ago
    
    So PyPI acts as keyserver, and basically a CSR signer for sub-CA wildcard package signing certs, and the package+key mapping trusted authority; and Sigstore acts as signature server; and both are centralized?
    And something has to call cosign (?) to determine what value to pass to `twine --attestations`?
    Blockcerts with DID identities is the W3C way to do Software Supply Chain Security like what SLSA.dev describes FWIU.
    And then now it's possible to upload packages and native containers to OCI container repositories, which support artifact signatures with TUF since docker notary; but also not yet JSON-LD/YAML-LD that simply joins with OSV OpenSSF and SBOM Linked Data on registered (GitHub, PyPI, conda-forge, deb-src,) namespace URIs.
    GitHub supports requiring GPG signatures on commits.
    Git commits are precedent to what gets merged and later released.
    A rough chronology of specs around these problems: {SSH, GPG, TLS w/ CA cert bundles, WebID, OpenID, HOTP/TOTP, Bitcoin, WebID-TLS, TOTP, OIDC OpenID Connect (TLS, HTTP, JSON, OAuth2.0, JWT), TUF, Uptane, CT Logs, WebAuthn, W3C DID, Blockcerts, SOLID-OIDC, Shamir Backup, transferable passkeys, }
    For PyPI: PyPI.org, TLS, OIDC OpenID Connect, twine, pip, cosign, and Sigstore.dev.
    Sigstore Rekor has centralized Merkle hashes like google/trillian which centralizedly hosts Certificate Transparency logs of x509 certs grants and revocations by Certificate Authorities.
    W3C DID Decentralized Identifiers [did-core] > 9.8 Verification Method Revocation : https://www.w3.org/TR/did-core/#verification-method-revocati... :
    > If a verification method is no longer exclusively accessible to the controller or parties trusted to act on behalf of the controller, it is expected to be revoked immediately to reduce the risk of compromises such as masquerading, theft, and fraud
    
    woodruffw - 9 months ago
    
    > So PyPI acts as keyserver, and basically a CSR signer for sub-CA wildcard package signing certs, and the package+key mapping trusted authority; and Sigstore acts as signature server; and both are centralized?
    No -- there is no keyserver per se in PEP 740's design, because PEP 740 is built around identity binding instead of long-lived signing keys. PyPI does no signing at all and has no CA or CSR components; it acts only as an attestation store for attestations, which it verifies on upload.
    
    westurner - 9 months ago
    
    PyPI signs uploaded packages with its signing key per PEP 458: https://peps.python.org/pep-0458/ :
    > This [PEP 458 security] model supports verification of PyPI distributions that are signed with keys stored on PyPI
    So that's deprecated by PEP 740 now?
    PEP 740: https://peps.python.org/pep-0740/ :
    > In addition to the current top-level `content` and `gpg_signature` fields, the index SHALL accept `attestations` as an additional multipart form field.
    > The new `attestations` field SHALL be a JSON array.
    > The `attestations` array SHALL have one or more items, each a JSON object representing an individual attestation.
    > Each attestation object MUST be verifiable by the index. If the index fails to verify any attestation in attestations, it MUST reject the upload. The format of attestation objects is defined under Attestation objects and the process for verifying attestations is defined under Attestation verification.
    What is the worst case resource cost of an attestation validation required of PyPI?
    blockchain-certificates/cert-verifier-js; https://westurner.github.io/hnlog/ Ctrl-F verifier-js
    `attestations` and/or `gpg_signature`;
    From https://news.ycombinator.com/item?id=39204722 :
    > An example of GPG signatures on linked data documents: https://gpg.jsld.org/contexts/#GpgSignature2020
    W3C DID self-generated keys also work with VC linked data; could a new field like the `attestations` field solve for DID signatures on the JSON metadata; or is that still centralized and not zero trust?
    
    westurner - 9 months ago
    
    (My mistake: PyPI is not a keyserver, and Sigstore is not a keyserver)

politelemon - 9 months ago

This feels like perfect being the enemy of good enough. There are examples where the system falls over but that doesn't mean that it completely negates the benefits.

It is very easy to get blinkered into thinking that the specific problems they're citing absolutely need to be solved, and quite possibly an element of trying to use that as an excuse to reduce some maintenance overhead without understanding its benefits.

creatonez - 9 months ago

Its benefits are very much completely negated in real-world use. See https://blog.yossarian.net/2023/05/21/PGP-signatures-on-PyPI... - the data suggests that nobody is verifying these PGP signatures at all.
- dig1 - 9 months ago
  
  I stopped reading after this: "PGP is an insecure [1] and outdated [2] ecosystem that hasn't reflected cryptographic best practices in decades [3]."
  The first link [1] suggests avoiding encrypted email due to potential plaintext CC issues and instead recommends Signal or (check this) WhatsApp. However, with encrypted email, I have (or can have) full control over the keys and infrastructure, a level of security that Signal or WhatsApp can't match.
  The second link [2] is Moxie's rant, which I don't entirely agree with. Yes, GPG has a learning curve. But instead of teaching people how to use it, we're handed dumbed-down products like Signal (I've been using it since its early days as a simple sms encryption app, and I can tell you, it's gone downhill), which has a brilliant solution: it forces you to remember (better to say to write down) a huge random hex monstrosity just to decrypt a database backup later. And no, you can't change it.
  Despite the ongoing criticisms of GPG, no suitable alternative has been put forward and the likes of Signal, Tarsnap, and others [1] simply don't cut it. Many other projects running for years (with relatively good security track records, like kernel, debian, or cpan) have no problem with GPG. This is 5c.
  [1] https://latacora.micro.blog/2019/07/16/the-pgp-problem.html
  [2] https://moxie.org/2015/02/24/gpg-and-me.html
  [3] https://blog.cryptographyengineering.com/2014/08/13/whats-ma...
  - wkat4242 - 9 months ago
    
    Yeah I still use pgp a lot. Especially because of hardware backed tokens (on yubikey and openpgp cards) which I use a lot for file encryption. The good thing is that there's toolchains for all desktop OSes and mobile (Android, with openkeychain).
    I'm sure there's better options but they're not as ubiquitous. I use it for file encryption, password manager (pass) and SSH login and everything works on all my stuff, with hardware tokens. Even on my tablet where Samsung cheaped out by not including NFC I can use the USB port.
    Replacements like fido2 and age fall short by not supporting all the usecases (file encryption for fido2, hardware tokens for age) or not having a complete toolchain on all platforms.
    
    nick__m - 9 months ago
    
    I use pcks11 on my yubikeys, would I gain something by using the PGP functionality instead?
    
    wkat4242 - 9 months ago
    
    Easier tooling. At least on Linux. PKCS11 requires a dynamically linked library (.so) which you pass to programs using it (like SSH) which is a bit annoying and because it's a binary linking it's not an API you can easily debug. It tends to just crash especially with version mismatches. The GPG agent API is easier for this. It works over a socket and can even be forwarded over SSH connections.
    Usually you end up using OpenSC/OpenCT for that. Also the tooling to manage the virtual smartcards is usually not as easy. I haven't used PIV for this (which is probably what you use on the yubikey to get PKCS11) but it was much harder to get going than simply using GPG Agent/scdaemon/libpcscd and the command "gpg card-edit" to configure the card itself.
    It's still not quite easy but I found it easier than PKCS11. I used that before with javacards and SmartHSM cards. The good thing about the PIV functionality is that it integrates really well with Windows active directory, which PGP of course doesn't. So if you're on Windows, PIV (with PKCS11) is probably the way to go (or Fido2 but it's more Windows Hello for Business rather than AD functionality then, it depends whether you're a legacy or modern shop).
    The big benefit of yubikeys over smartcards is that you can use the touch functionality to approve every use of the yubikey, whereas an unlocked smartcard will approve every action without a physical button press (of course because it doesn't have such a button).
    
    aborsy - 9 months ago
    
    I second this!
    
    nick__m - 9 months ago
    
    thank you for the detailed answer.
  - - 9 months ago
    
    [deleted]
  - rtmang - 9 months ago
    
    [flagged]
- Diti - 9 months ago
  
  I believe the article you linked to doesn’t seem to say anything about “nobody verifying PGP signatures”. We would need PyPI to publish their Datadog & Google Analytics data, but I’d say the set of users who actually verify OpenPGP signatures intersects with the set of users faking/scrambling telemetry.
  - woodruffw - 9 months ago
    
    I wrote the blog post in question. The claim that "nobody is verifying PGP signatures (from PyPI)" comes from the fact that around 1/3rd had no discoverable public keys on what remains of the keyserver network.
    Of the 2/3rd that did have discoverable keys, ~50% had no valid binding signature at the time of my audit, meaning that obtaining a living public key has worse-than-coin-toss odds for recent (>2020) PGP signatures on PyPI.
    Combined, these datapoints (and a lack of public noise about signatures failing to verify) strongly suggest that nobody was attempting to verify PGP signatures from PyPI at any meaningful scale. This was more or less confirmed by the near-zero amount of feedback PyPI got once it disabled PGP uploads.
    
    opello - 9 months ago
    
    This all makes sense.
    PEP 740 mentions:
    > In their previously supported form on PyPI, PGP signatures satisfied considerations (1) and (3) above but not (2) (owing to the need for external keyservers and key distribution) or (4) (due to PGP signatures typically being constructed over just an input file, without any associated signed metadata).
    It seems to me that the infrastructure investment in sigstore.dev vs. PGP seems arbitrary. For example, on the PGP side, PyPI keyserver and tooling to validate uploads as to address (2) above. And (4) being handled similar to PEP 740 with say signatures for provenance objects. Maybe the sigstore is "just way better" but it doesn't exactly seem so cut-and-dried of a technical argument from the things discussed in these commends and the linked material.
    It's perfectly responsible to make a choice. It seems unclear just what the scope of work difference would be despite there being a somewhat implicit suggestion across the discussions and links in the comments that it was great. Maybe that's an unreasonable level of detail to expect? But with what seems to come across as "dogging on PGP" it seems what I've found disappointing with my casual brush with this particular instance of PGP coming up in the news.
    
    woodruffw - 9 months ago
    
    (2) is addressed by Sigstore having its own infrastructure and a full-time rotation staff. PyPI doesn't need to run or operationalize anything, which is a significant relief compared to the prospect of having to operationalize a PGP keyserver with volunteer staffing.
    (I'm intentionally glossing over details here, like the fact that PyPI doesn't need to perform any online operations to validate Sigstore's signatures. The bottom line is that everything about it is operationally simpler and more modern than could be shaped out of the primitives PGP offers.)
    (4) could be done with PGP, but would go against the long-standing pattern of "sign the file" that most PGP tooling is ossified around. It also doesn't change the fact that PGP's signing defaults aren't great, that there's a huge tail of junk signing keys out there, and that to address those problems PyPI would need to be in the business of parsing PGP packets during package upload. That's just not a good use of anybody's time.
    
    opello - 9 months ago
    
    > having its own infrastructure
    This seems like a different brand of the keyserver network?
    > PyPI doesn't need to run or operationalize anything
    So it's not a new operational dependency because it's index metadata? That seems more like an implementation detail (aside from the imagined PGP keyserver dependency) that seems accommodatable given either system.
    > like the fact that PyPI doesn't need to perform any online operations to validate Sigstore's signatures
    I may be missing something subtle (or glaring) but "online operations" would be interactions with some other service or a non-PSF service? Or simply a service not-wholly-pypi? Regardless, the index seems like it must be a "verifier" for design consideration (2) from PEP 740 to hold, which would mean that the index must perform the verification step on the uploaded data--which seems inconsequentially different between an imagined PGP system (other than it would have to access the imagined PyPI keyserver) and sigstore/in-toto.
    > ... PyPI would need to be in the business of parsing PGP packets during package upload.
    But the sigstore analog is the JSON array of in-toto attestation statement objects.
    
    woodruffw - 9 months ago
    
    > This seems like a different brand of the keyserver network?
    It serves a vaguely similar purpose, if that's what you mean. That shouldn't be surprising, since this is all PKI-shaped problems under the hood.
    To reiterate: the operational constraints here are (1) simplicity and reliability for PyPI, plus secure defaults for the signing scheme itself. Running a PGP keyserver would not offer (1), and PGP as an ecosystem does not offer (2). This is even before desired properties, like strong identity binding, which PGP cannot offer in its current form.
    > "online operations" would be interactions with some other service or a non-PSF service? Or simply a service not-wholly-pypi?
    In the PGP setting, that means PyPI would need to pull from a keyserver. That keyserver would need to be one that PyPI doesn't control in order for the threat model to be coherent.
    In the PEP 740 setting, PyPI does not need to pull any material from anywhere besides what the uploader is providing: the signatures in the attestations uploaded are signed with an attacked ephemeral signing certificate, which has a trusted publisher as its identity. That signing certificate can then be chained to an already established root of trust, in "normal" X.509 PKI fashion.
    You could approximate this design with PGP, but none of the primitives currently exist (or if they exist, are inoperational).
    > But the sigstore analog is the JSON array of in-toto attestation statement objects.
    Yes. Believe it or not, a big ugly JSON blob is simpler than dealing with PGP's mess of packet versions and formats.
    
    opello - 9 months ago
    
    > This is even before desired properties, like strong identity binding, which PGP cannot offer in its current form.
    If the strong identity binding is OIDC then I disagree. It's convenient but no more evidence of identity than being able to unlock a private key.
    > ... one that PyPI doesn't control in order for the threat model to be coherent.
    This doesn't make sense unless the author key material was only ever published on the PyPI keyserver.
    > PyPI does not need to pull any material from anywhere besides what the uploader is providing
    > in "normal" X.509 PKI fashion.
    What about checking for revocation?