JIT
← Notebook
Field notes SecurityIdentityHomelab

Nobody holds an SSH key here, including me

A private CA as the single SSH trust root: short-lived certs for humans (SSO+MFA), 10-minute ones for machines, and host certs that kill known_hosts TOFU.

Same boring auditor’s question I keep coming back to, pointed at SSH this time: who can log into that box right now, and if I had to, could I take it back today? For most homelabs, mine included until recently, the honest answer is grim. There’s a public key sitting in ~/.ssh/authorized_keys on every host. It was copied there once, by hand or by Ansible, and it has been there ever since. It has no expiry. It doesn’t say who you are. And revoking it means SSHing into every box to delete a line — assuming you can still enumerate every box.

The key outlives the server, the project, sometimes the person. That is the bug. Not a weak algorithm, not a missing passphrase — the standing-ness itself. So I took the keys away. From the agents, from the automation, and from myself. Here’s the shape of it on one page, then the walk-through.

step-ca — the one trust root every host trusts it · it signs short-lived certificates A · User CA who logs in humans → SSO + MFA → ~8 h cert machines → JWK → ≤10 min cert principal = identity, not what you ask for B · Host CA which host you reached the host presents a host certificate clients trust the Host CA once — no TOFU prompt survives golden-template rebuilds C · trust, distributed Ansible, every host hosts trust the User CA TrustedUserCAKeys hosts present a Host cert clients pin the Host CA @cert-authority No standing keys anywhere. Every credential is short-lived and minted on demand — then it expires.
One private CA, three pillars: it signs who you are (User CA) and which host you reached (Host CA); Ansible distributes the trust. Nothing standing.

The standing key is the actual bug

Walk the failure modes of the humble authorized_keys line and they all trace back to one property: it never goes away on its own.

  • Sprawl. The same key clones itself onto every host, shadow-clone-jutsu style, except nobody ever dispels the clones. One stolen private key is a master key, and you don’t know how many doors it opens because you stopped counting hosts two rebuilds ago.
  • No identity. A key isn’t a person. ssh-rsa AAAA… in a log tells you a key authenticated, not who did, nor on whose behalf.
  • No expiry. It works today, next year, and after you’ve forgotten it exists. A credential that never expires isn’t a credential. It’s a liability with a comment field.
  • It survives rebuilds. Golden-template a box and you dutifully paste the key back, because the alternative is being locked out. The standing access is load-bearing.
  • Revocation is archaeology. Taking access back means visiting every host. On the client side it’s the mirror image: known_hosts trust-on-first-use, which screams “the authenticity of host can’t be established” every time a box is rebuilt, training you to type yes without reading.

Don’t hand out keys. Hand out trust — and let it expire.

One CA, three pillars

Flip the model: stop distributing keys and start distributing trust. Run a small private CA — I use step-ca (Smallstep, open source). Make it the single thing every host trusts. It does three jobs (the diagram above).

Pillar A, the User CA signs short-lived login certificates. Each cert carries a principal (who you are) and an expiry measured in minutes or hours. The host trusts the User CA via one TrustedUserCAKeys line, a valid cert logs you in, no per-user key anywhere.

Pillar B, the Host CA signs each host’s own key into a host certificate. Clients trust the Host CA once, and the TOFU prompt disappears forever, even after a rebuild, because the rebuilt host gets a fresh cert from the same CA. The single most satisfying side effect of this whole project: no more known_hosts warnings.

Pillar C, distribution is just Ansible: every host trusts the User CA and presents a host cert, every client pins the Host CA. The trust is configuration, not a pile of copied keys.

Humans: SSO + MFA, then a cert that expires by dinner

This is the part that changed my own day. I no longer keep a homelab SSH key. To get access I authenticate to the CA through Authentik, the identity provider I already run, in the browser, with MFA. The CA hands back a certificate good for a working day, dropped straight into my ssh-agent. Nothing lands on disk.

You · your laptop no SSH key on disk ssh a-host → no cert → step ssh login Authentik · your IdP browser login · SSO + MFA (TOTP) id_token · who you are + your groups step-ca · User CA template maps your group → principal issues an ~8 h certificate cert → ssh-agent · nothing on disk Target host trusts the User CA · principal == login user ✓ you trust the Host CA → no host-key prompt ✓ Expires by dinner. Revoke = disable the human in the IdP — no box-by-box cleanup.
One step ssh login per day: SSO + MFA → a few-hour cert in the agent → SSH anywhere it's trusted. Disable me in Authentik and the access is gone.

Two details make this safe rather than merely convenient. The cert’s principal comes from my identity and group membership, decided by a template on the CA. I cannot request a login I’m not entitled to. And revocation stops being archaeology: disable the account in Authentik and no new cert is minted, the one in my agent expires by dinner. Next morning I respawn: one SSO + MFA and I’m back with a fresh cert, the previous one already dead and not even worth cleaning up. Nobody visits a single host.

Machines: a ten-minute key that mints itself

Agents and automation get the same treatment without a human in the loop. Instead of a static key in some .env, each one authenticates with a provisioner credential and mints a ≤10-minute certificate per connection, uses it, and lets it expire. This cert will self-destruct in ten minutes — except nothing explodes and no Tom Cruise rappels down a building. No standing key on the caller, no authorized_keys on the target. The ten-minute cap is enforced at the CA. Ask for an hour and it says no.

From a key that never leaves → a certificate that barely stays BEFORE · standing keys one public key copied onto every host no expiry · not tied to a person survives every rebuild revoke = SSH into every box, delete a line AFTER · one CA, short-lived certs humans: SSO + MFA → ~8 h cert machines: ≤10 min cert per connection hosts: identified by a host cert revoke = it already expired
The whole project in one swap: a key that lives forever on every host becomes a certificate that exists for the next ten minutes (machines) or the rest of the day (humans).

This is the same instinct as keeping each agent’s blast radius small, the SSH-layer sibling of the confused-deputy piece, which narrows what a token can do between agents. Here we narrow what an SSH credential can do, and for how long, between machines.

What it costs you (the honest footnote)

This isn’t free, and the auditor in me won’t let the post end on the glossy part. The CA is now load-bearing. You know that xkcd, the one where all of modern infrastructure balances on a single tiny block some random person has been thanklessly maintaining somewhere? That block is me now. If it’s down, nobody mints a new cert, which is the safe direction (fail-closed beats fail-open), but you feel it. So you earn the right to delete the old keys:

  • Keep one sealed break-glass key, offline, for the day the CA and the IdP are both down. Test it.
  • Back the CA up and don’t co-locate it with something that will take it down with it.
  • Keep the CA’s root key offline; only the signing key is online.
  • Run NTP everywhere: short-lived certs are unforgiving about clock skew.

None of that is exotic. It’s the price of converting “a key, forever” into “a certificate, for the next ten minutes.”

So, back to the auditor’s question. Who can log into that box right now? Whoever holds an unexpired, CA-signed cert, and I can name them, because every issuance is logged with a principal. Can I take it back today? I don’t have to. It already took itself back. The standing SSH key had a good run, it can retire.

(The SSH-layer companion to the confused deputy comes for your AI agents, same instinct, scope and time-box every credential, applied this time to who can shell into the box.)