Weaknesses in TOTP-based 2FA for SSH and an out-of-band extension for automation
================================================================================

.. toctree::
   :maxdepth: 1
   :caption: Contents:

.. contents:: Table of Contents
   :depth: 3

.. warning:: None of the Discoverer clusters supports 2FA on its login nodes.

About
-----

Two-factor authentication (2FA) is widely recommended as a security
improvement over single-factor SSH key authentication. This document
explains weaknesses in the usual implementation — software TOTP layered
on top of SSH via PAM — including weaker guarantees than commonly
assumed, structural friction with unattended operation, and trade-offs
in hardware-based alternatives that are rarely stated plainly. A
particular concern is the use of the same FIDO2 hardware device as both
the SSH second factor and the storage medium for an ``sk-``\ type SSH
key, which collapses the independence requirement that gives
multi-factor authentication its security rationale. Each weakness is
examined in turn, followed by an out-of-band second-factor extension
intended to support both interactive sessions and automated SSH access
without reverting to single-factor exemptions for machine principals.


The automation problem
----------------------

The most immediate and practical objection to 2FA for SSH is that it is
structurally incompatible with unattended operation. Modern
infrastructure depends heavily on automated processes: CI/CD pipelines,
configuration management agents (Ansible, Puppet, Chef), scheduled
backup jobs, monitoring daemons, and container orchestration systems all
establish SSH connections without a human present to supply a second
factor. Requiring interactive 2FA input in these contexts is not merely
inconvenient — it makes automation architecturally impossible without
carving out exceptions that undermine the policy itself.

The typical workaround is to exempt service accounts from the 2FA
requirement, or to pre-provision TOTP secrets in environment variables
and configuration files accessible to the automated process. Both
approaches are security regressions: they restore single-factor
authentication for a large and persistent class of connections, often to
the most sensitive targets in the environment (build systems, deployment
hosts, internal infrastructure). The result is a two-tier system where
human-interactive sessions have 2FA and automated sessions do not — the
reverse of what a threat model would recommend, since automated accounts
frequently hold elevated or broad access.


The TOTP key storage paradox
----------------------------

Time-based one-time password systems, as defined in RFC 6238, operate on
a shared secret (the TOTP seed) generated at enrolment time. The seed
must remain available to the client at all times so that OTP values can
be computed on demand. On a workstation or server running a software
TOTP client — ``google-authenticator``, ``oath-toolkit``, or
``libpam-google-authenticator`` — this seed is stored as a flat file,
typically
```~/.google_authenticator`` <https://github.com/google/google-authenticator-libpam>`__,
with filesystem permissions set to ``0400``. The encoding is base32, not
encryption; the seed is effectively plaintext.

This creates a structural paradox. The “second factor” is co-located
with the first factor — the SSH private key — on the same machine, under
the same user account, within the same threat model. An attacker who has
gained read access to the user’s home directory, which is precisely the
scenario SSH authentication is intended to resist, can extract both
factors simultaneously. The second factor provides no additional barrier
in any scenario where the attacker already has filesystem access.

This is not a subtle or edge-case concern. It is a direct consequence of
the co-location of authenticators on a single system. NIST SP 800-63B
recognises this class of weakness implicitly in its Authenticator
Assurance Level (AAL) framework: software-based OTP generators are
classified as AAL1-capable at best when used alone, and the guidelines
note that authentication protocols requiring the verifier to store
secrets persistently are considered weaker precisely because of the risk
of compromise and secret theft (`NIST SP 800-63B, Section
4 <https://pages.nist.gov/800-63-3/sp800-63b.html>`__). The software
TOTP client on a shared filesystem is the canonical example of this
failure mode.


Hardware tokens: Genuine security at the cost of recoverability
---------------------------------------------------------------

A dedicated hardware security token — a standalone OATH TOTP device, or
a YubiKey configured in OATH-HOTP mode — genuinely resolves the storage
paradox. The TOTP seed is generated and stored inside tamper-resistant
hardware and cannot be extracted by software means. This represents a
real improvement: the second factor is now independent of the filesystem
and of any software compromise on the authenticating host.

The trade-off is irrecoverable loss. The security property that makes
the hardware token valuable — the seed is locked inside the device and
cannot leave it — is precisely the property that makes duplication
impossible. There is no backup of the seed. If the device is lost,
damaged, or destroyed, the second factor is gone permanently. Recovery
requires either a pre-established out-of-band path (backup codes
registered at enrolment, a secondary device registered in parallel) or
administrative intervention to re-enrol the user entirely.

Yubico’s own guidance acknowledges this directly, recommending that all
users register at least two independent authenticators at onboarding
specifically because losing the sole authenticator results in a hard
lockout with no self-service recovery path (`Yubico, FIDO2 Passwordless
Authentication <https://www.yubico.com/authentication-standards/fido2/>`__).
In environments with many users or with automated access requirements,
this operational fragility becomes a significant reliability concern,
and the administrative overhead of managing re-enrolment events can be
substantial.


FIDO2 device reuse: Shared fate between SSH key and second factor
-----------------------------------------------------------------

A more subtle problem arises when the same FIDO2 hardware device — a
YubiKey 5 series or equivalent — is used for two distinct purposes
simultaneously: as the storage medium for a resident ``sk-``\ type SSH
key, and as the second factor in a PAM-based SSH 2FA scheme.

Background: sk-type SSH keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

OpenSSH 8.2, released in February 2020, introduced support for FIDO/U2F
hardware authenticators via two new key types: ``ecdsa-sk`` and
``ed25519-sk``. These key types bind the SSH private key to the hardware
token. The on-device key handle, combined with a mandatory physical
user-presence gesture (a touch), is required to produce a signature. The
private key material never leaves the device. Yubico’s documentation
describes this as providing “strong protection against malware,
phishing, and remote attacks targeting SSH credentials” (`Yubico,
Securing SSH with
FIDO2 <https://developers.yubico.com/SSH/Securing_SSH_with_FIDO2.html>`__).

When a resident key is generated (using the ``-O resident`` flag), the
key handle is stored on the device itself rather than as a file on disk.
This allows the key to be used from any host by retrieving it with
``ssh-keygen -K``, and is commonly presented as a significant security
and portability advantage.

The shared-fate problem
~~~~~~~~~~~~~~~~~~~~~~~

When the same device stores both a resident ``sk-``\ type SSH key and
the FIDO2 application credentials used as a second factor (for WebAuthn,
PAM-FIDO2, or TOTP via the device’s OATH application), the two factors
are no longer independent. Loss, theft, or destruction of the device
simultaneously eliminates both the primary SSH authenticator and the
second factor. The attacker (or the accident) needs to compromise only
one physical object to defeat both layers.

This is not merely an operational convenience concern — it is a
violation of the independence requirement that gives multi-factor
authentication its security rationale. NIST SP 800-63B defines
multi-factor authentication as requiring “possession and control of two
distinct authentication factors” and identifies AAL2 as requiring two
factors specifically to ensure that compromising one does not
automatically compromise the other (`NIST SP 800-63B, Section
4.2 <https://pages.nist.gov/800-63-3/sp800-63b.html>`__). When both
factors reside on a single physical token, this distinctness is nominal
rather than real.

NIST SP 800-63B does permit a single device to satisfy both factor
requirements at AAL3, where the hardware provides verifier-impersonation
resistance — but this is a deliberate carve-out for high-assurance
hardware cryptographic devices operating under a specific threat model,
not a general endorsement of collapsing both factors onto any available
FIDO2 token (`NIST SP 800-63B, Section
4.3 <https://pages.nist.gov/800-63-3/sp800-63b.html>`__).

A concrete operational consequence
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The shared-fate problem is not merely theoretical. The YubiKey FIDO2
application stores both FIDO2 credentials (including ``sk-``\ type SSH
key handles) and, if configured, OATH TOTP seeds on the same application
partition. If the FIDO2 PIN is blocked — for example, after too many
failed attempts — recovery requires ``ykman fido reset``, which wipes
the entire FIDO2 application, including all FIDO2 credentials and any
resident SSH keys (`vorburger, ed25519-sk
notes <https://github.com/vorburger/vorburger.ch-Notes/blob/develop/security/ed25519-sk.md>`__).
This is a single administrative action that destroys both the SSH key
and the second factor simultaneously, with no granular revocation
available.

Furthermore, Yubico’s own guidance notes that the YubiKey can be used
for SSH authentication “without requiring a dedicated YubiKey for just
SSH authentication” — sharing the device with other FIDO2/WebAuthn
services — while simultaneously recommending the configuration of two
physical YubiKeys in case one is broken or lost (`Cryptsus, Configuring
OpenSSH with
YubiKey <https://cryptsus.com/blog/how-to-configure-openssh-with-yubikey-security-keys-u2f-otp-authentication-ed25519-sk-ecdsa-sk-on-ubuntu-18.04.html>`__).
These two pieces of advice are in tension: recommending device sharing
while also recommending redundancy implicitly acknowledges the
shared-fate risk without naming it directly.

Summary of the FIDO2 reuse concern
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The concern is not that using a FIDO2 device for ``sk-``\ type SSH keys
is insecure in isolation — it is a genuine improvement over software
keys stored on disk. The concern is that adding a PAM-based 2FA step
backed by the same device does not add an independent second factor; it
adds the appearance of a second factor while leaving the actual security
boundary unchanged. An attacker who obtains the physical token — or a
user who loses it — faces the same outcome regardless of whether 2FA was
nominally enabled.


The protocol-layer problem: The second factor arrives inside an already-established session
-------------------------------------------------------------------------------------------

Beyond the storage and independence issues examined above, there is a
deeper structural problem with how 2FA is positioned within the SSH
protocol stack itself. The second factor is not verified before the SSH
session is established — it is verified *within* an already established
encrypted tunnel. Understanding why this matters requires a brief
account of how SSH authentication is layered.

The SSH protocol stack
~~~~~~~~~~~~~~~~~~~~~~

The SSH protocol is composed of three distinct layers, defined across
separate RFCs and operating strictly in sequence:

1. The Transport Layer (RFC 4253) is established first. It negotiates
   encryption algorithms, performs a key exchange (typically
   Diffie-Hellman or ECDH), authenticates the *server* to the client via
   the host key, and establishes a confidential, integrity-protected
   channel. At the end of this phase, the client has a secure tunnel to
   a server whose identity it has verified — but the client itself has
   not yet authenticated in any way (`RFC 4253,
   §1 <https://www.rfc-editor.org/rfc/rfc4253>`__).

2. The Authentication Layer (RFC 4252) runs over the transport layer. It
   authenticates the *client* to the server. RFC 4252 is explicit: “The
   SSH authentication protocol runs on top of the SSH transport layer
   protocol and provides a single authenticated tunnel for the SSH
   connection protocol.” The server may require multiple authentication
   steps in sequence — this is what
   ``AuthenticationMethods publickey,keyboard-interactive`` in
   ``sshd_config`` implements — but all of them occur within the
   already-encrypted transport channel (`RFC 4252,
   §1 <https://www.rfc-editor.org/rfc/rfc4252>`__).

3. The Connection Layer (RFC 4254) multiplexes the authenticated channel
   into logical sub-channels (shell sessions, port forwards, SFTP, etc.)
   and runs only after authentication is complete.

How the PAM-based second factor fits in
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The mechanism used to deliver a TOTP or challenge-response second factor
over SSH is the ``keyboard-interactive`` authentication method, defined
in RFC 4256 (*Generic Message Exchange Authentication for the Secure
Shell Protocol*). It works by allowing the server to send one or more
prompts to the client and receive responses, all within the SSH
authentication layer. RFC 4256 explicitly anticipates PAM as the typical
backend: “It is expected that this authentication method would typically
be backended by [PAM]” (`RFC 4256,
§3.3 <https://www.rfc-editor.org/rfc/rfc4256>`__).

The standard 2FA-over-SSH configuration combines two authentication
methods in sequence using OpenSSH’s ``AuthenticationMethods`` directive:

::

   AuthenticationMethods publickey,keyboard-interactive

This means: first complete public key authentication successfully, then
complete a keyboard-interactive exchange (which PAM resolves to a TOTP
prompt). Both steps occur within the same SSH authentication layer,
which itself runs inside the transport layer channel already established
in step 1.

What this means in practice
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The sequencing has an important consequence that is rarely stated
plainly: by the time the TOTP prompt is issued, the client has already
proven possession of the private SSH key and received a
``SSH_MSG_USERAUTH_SUCCESS`` partial-success signal from the server. RFC
4252 specifies this clearly: “the server MAY require additional
authentications after successful authentication,” and signals partial
success via the ``SSH_MSG_USERAUTH_FAILURE`` message with the
``partial success`` boolean set to true (`RFC 4252,
§5.1 <https://www.rfc-editor.org/rfc/rfc4252>`__).

From the server’s perspective, the authentication is not yet complete,
and no shell or connection channel is opened until both factors succeed.
In this narrow sense, the design is correct. However, from the user’s
experience — and from the perspective of any tool or script observing
the SSH session — a prompt appears inside what already looks and feels
like an established connection. The TOTP challenge arrives as a PAM
prompt delivered through the ``keyboard-interactive`` exchange,
indistinguishable to the user from a shell prompt or any other
in-session interaction.

Security consequences of in-session prompting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This architecture creates several concrete problems beyond the
automation issue discussed earlier:

The 2FA prompt is delivered over the same channel the attacker controls.
If the transport layer has been compromised — through a rogue host key
accepted by the client, a man-in-the-middle attack during host key
verification, or a compromised SSH agent forwarding chain — the TOTP
prompt and response transit the compromised channel. The second factor
provides no out-of-band verification; it travels through the same pipe
as everything else. This is structurally identical to the “SIM swap”
argument against SMS-based 2FA: the second factor is delivered through
the same medium whose compromise it is meant to compensate for.

The prompt is trivially mimicked. Since the ``keyboard-interactive``
prompt is just a string sent by the server, any server (including a
malicious one) can send an identical prompt. There is no cryptographic
binding between the TOTP prompt and the server’s identity beyond the
host key verification that occurred in the transport layer. A user who
has habitually accepted unknown host keys — extremely common in practice
— has no reliable way to distinguish a genuine TOTP prompt from one
issued by a machine-in-the-middle.

The CERN Computer Security team, deploying PAM-based 2FA at scale, noted
this precise concern in their own implementation: they explicitly
required that “the second factor should never be asked if there was no
first factor,” and implemented additional logic to ensure the 2FA
challenge only fires after a confirmed first-factor success —
specifically because a bare ``keyboard-interactive`` prompt could
otherwise be issued to any connecting client, enabling SMS-spam attacks
against users (`CERN-CERT,
pam_2fa <https://cern-cert.github.io/pam_2fa/>`__). This is a partial
mitigation of the prompt-mimicry problem, not a solution to it.

Comparison with proper out-of-band second factors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A second factor delivered entirely outside the SSH channel — for
example, a push notification to a separate device, a hardware token
whose output is entered before SSH is invoked, or a certificate-based
pre-authentication step — does not share this weakness. The second
factor verification occurs before or entirely independently of the
transport session, so compromise of the SSH channel does not
automatically compromise the 2FA verification path.

The PAM-over-keyboard-interactive approach collapses both factors into
the same channel. This is a protocol-level design limitation, not a
configuration error. It cannot be fixed by tuning ``sshd_config``; it is
inherent in the architecture of RFC 4252 and RFC 4256 as applied to SSH
2FA.


Weakness comparison
-------------------

The following table summarises the weaknesses identified:

+-------------+-------------+-------------+-------------+-------------+
| Imp         | Automation  | Key storage | Rec         | Factor      |
| lementation |             |             | overability | i           |
|             |             |             |             | ndependence |
+=============+=============+=============+=============+=============+
| Software    | Breaks      | Plaintext   | Recoverable | Fails       |
| TOTP (PAM)  |             | on disk     | (seed can   | (co-located |
|             |             |             | be          | with SSH    |
|             |             |             | r           | key)        |
|             |             |             | e-enrolled) |             |
+-------------+-------------+-------------+-------------+-------------+
| Hardware    | Breaks      | Secure      | Non-        | Independent |
| TOTP token  |             | (on-device) | recoverable | (separate   |
|             |             |             |             | device)     |
+-------------+-------------+-------------+-------------+-------------+
| FIDO2       | Breaks      | Secure      | Non-        | Fails       |
| device      |             | (on-device) | recoverable | (shared     |
| (sk-SSH +   |             |             |             | single      |
| 2FA, same   |             |             |             | device)     |
| device)     |             |             |             |             |
+-------------+-------------+-------------+-------------+-------------+
| FIDO2       | Breaks for  | Secure      | Non-        | N/A (single |
| device      | resident    | (on-device) | recoverable | factor)     |
| (sk-SSH     | keys        |             |             |             |
| only, no    |             |             |             |             |
| 2FA)        |             |             |             |             |
+-------------+-------------+-------------+-------------+-------------+

No current 2FA-over-SSH configuration simultaneously satisfies all four
requirements. Software TOTP provides the illusion of a second factor
while failing the independence test for any attacker with filesystem
access. Hardware tokens provide genuine independence but at the cost of
recoverability and at the expense of automation. Using the same FIDO2
device for both the SSH key and the second factor provides the
appearance of independence while providing none.


Proposed protocol: Unified OOB second factor for SSH
----------------------------------------------------

The analysis in the preceding sections establishes four requirements an
adequate SSH second factor must satisfy:

1. Channel independence — it must not travel through the same SSH
   session it protects.
2. Key independence — the second-factor credential must be
   cryptographically distinct from the SSH private key.
3. Automation compatibility — it must be completable programmatically
   without human interaction.
4. Replay and token-theft resistance — credentials intercepted in one
   session must not be usable in another.

The mechanism proposed here satisfies all four. The central design
principle is a unified ``keyboard-interactive`` prompt that
simultaneously offers a TOTP path for fallback and a web API (OOB) path
for both human clients and automation. The two paths share the same
token issuance and session-binding infrastructure; only the redemption
mechanism differs. The strength of the OOB path — plain X.509 mTLS, or
full HSM attestation — is determined entirely by server-side policy,
transparent to the client agent logic.


The unified prompt
~~~~~~~~~~~~~~~~~~

After successful first-factor (``publickey``) authentication, the PAM
module issues a single ``keyboard-interactive`` prompt containing both a
TOTP field and an OOB URL:

::

   Two-factor authentication required.

   Option 1 — Enter your TOTP code below, or
   Option 2 — Authenticate via Web API (leave this field empty):
              OOB-AUTH https://auth.example.com/v1/ssh-auth/6a3f9c1e8d2b4f7a0e5c2d9b3f1a8e4d

   TOTP code (or leave empty to use Web API):

The client’s response determines which path the PAM module pursues:

-  Non-empty response — treated as a TOTP code; validated immediately
   in-band. This is the existing TOTP path, unchanged, provided as a
   human fallback and for environments where the OOB infrastructure is
   unavailable.
-  Empty response — OOB path selected; the PAM module suspends the
   in-band exchange and waits for a side-channel signal from the auth
   service. The SSH session holds at the prompt until the token is
   redeemed or the TTL expires.

A human user reads the URL and can paste it into a browser or CLI tool
if desired. An automated client — a CI agent, an ``ssh-oob-agent``
daemon, or the Java HSM client — parses the ``OOB-AUTH`` prefix from the
prompt text, extracts the URL, submits an empty response immediately,
and in parallel redeems the token via the web API. The automation client
needs no special protocol support beyond the ability to read
``keyboard-interactive`` prompt text and submit a response.

This design keeps the SSH client (``ssh`` itself) entirely unmodified:
it faithfully relays whatever the PAM module sends as a prompt and
whatever the user (or agent wrapping it) sends as a response. No SSH
client patches are required.


Server-side policy tiers
~~~~~~~~~~~~~~~~~~~~~~~~

The auth service enforces one of two policy tiers for the OOB path,
selected per user or group via the server’s policy registry. Both tiers
share identical token issuance, session binding, and single-use
mechanics. They differ only in what the redemption request must prove.

+-----------------+-----------------+-----------------+-----------------+
| Tier            | Name            | What is         | Who uses it     |
|                 |                 | verified at     |                 |
|                 |                 | redemption      |                 |
+=================+=================+=================+=================+
| Tier 1          | X.509 mTLS      | Possession of a | Automation,     |
|                 |                 | valid           | service         |
|                 |                 | site-CA-signed  | accounts,       |
|                 |                 | client          | standard users  |
|                 |                 | certificate     |                 |
+-----------------+-----------------+-----------------+-----------------+
| Tier 2          | HSM Attestation | Possession of   | Privileged      |
|                 |                 | an enrolled HSM | administrators, |
|                 |                 | device + valid  | production      |
|                 |                 | attestation     | access          |
|                 |                 | chain + user    |                 |
|                 |                 | PIN unlocked    |                 |
+-----------------+-----------------+-----------------+-----------------+

The PAM module encodes the required tier in the challenge URL as a query
parameter:

::

   OOB-AUTH https://auth.example.com/v1/ssh-auth/6a3f9c1e...?policy=tier1
   OOB-AUTH https://auth.example.com/v1/ssh-auth/6a3f9c1e...?policy=tier2

The client agent reads the ``policy`` parameter and selects the
appropriate credential and redemption procedure. A Tier 2 URL presented
to a Tier 1-only agent causes the auth service to reject the attempt —
the agent cannot satisfy HSM attestation requirements, and the session
fails cleanly. A Tier 1 URL presented to a Tier 2-capable agent is
accepted without the attestation step.


Components
~~~~~~~~~~

``pam_oob_auth`` (server-side PAM module) Replaces the TOTP PAM module.
On invocation it: generates the token ``T``, computes the session
binding tag ``B``, looks up the user’s policy tier, constructs the
unified prompt, and dispatches to either the TOTP verification path or
the OOB wait path depending on the client’s response.

``ssh-auth-service`` (server-side HTTPS service) Manages token
lifecycle. Exposes two redemption endpoints — ``/v1/ssh-auth/{T}`` for
Tier 1 (mTLS only) and ``/v1/ssh-auth-hsm/{T}`` for Tier 2 (HSM
attestation). Signals the PAM module via an internal Unix socket when a
token is verified. This service is the sole point of mTLS and
attestation verification; the SSH daemon itself performs neither.

``ssh-oob-agent`` (client-side, Tier 1) A lightweight daemon or wrapper
holding an X.509 client certificate and key. Monitors SSH session
prompts for the ``OOB-AUTH`` prefix, submits the empty TOTP response
automatically, and redeems Tier 1 tokens via mTLS. Used by CI/CD systems
and service accounts.

``ssh-hsm-client.jar`` (client-side, Tier 2) A Java application that
loads an HSM via the ``SunPKCS11`` provider, prompts for PIN, and
performs Tier 2 redemption. It must keep a single USB-assigned token
session (PC/SC reader selection, vendor USB library, or equivalent) so
PKCS#11 ``C_Login`` and all subsequent operations target one physical
device. The implementation must read manufacturer attestation material
from that token and treat it as a client-side gate: until attestation
for the selected reader is present and internally consistent, the code
must not call ``C_Sign`` for session authorisation. Only after that gate
passes may it fetch the portal nonce and sign it with the user’s
on-token key. Used by privileged administrators; may run as an SSH
wrapper around the prompt or be invoked manually.


Credential material
~~~~~~~~~~~~~~~~~~~

Two independent key pairs are required per client identity. They must
not share a root CA, a keystore, or a hardware device.

+-----------------+-----------------+-----------------+-----------------+
| Credential      | Type            | Purpose         | Storage         |
+=================+=================+=================+=================+
| SSH key         | Ed25519 on disk | First factor:   | ``~/.ssh/``;    |
|                 | or              | SSH identity    | FIDO2 token for |
|                 | ``ed25519-sk``  |                 | ``sk-`` keys;   |
|                 | / ``ecdsa-sk``  |                 | or PKCS#11      |
|                 | on FIDO2; or,   |                 | token after     |
|                 | for a           |                 | PKCS#12 +       |
|                 | non-``sk-`` key |                 | X.509v3 import  |
|                 | on an HSM or    |                 |                 |
|                 | smart card,     |                 |                 |
|                 | ECDSA or RSA    |                 |                 |
|                 | imported as     |                 |                 |
|                 | PKCS#12 with a  |                 |                 |
|                 | self-signed     |                 |                 |
|                 | X.509v3         |                 |                 |
|                 | certificate     |                 |                 |
|                 | (see prose      |                 |                 |
|                 | below)          |                 |                 |
+-----------------+-----------------+-----------------+-----------------+
| Client          | X.509v3,        | OOB second      | Separate        |
| certificate     | ``clientAuth``  | factor: mTLS    | keystore or     |
| (Tier 1)        | EKU, site CA    | identity        | secrets manager |
+-----------------+-----------------+-----------------+-----------------+
| HSM +           | PKCS#11 device  | OOB second      | Physical HSM    |
| attestation     | with            | factor:         | token; never    |
| chain (Tier 2)  | manufacturer    | device-bound,   | exported        |
|                 | attestation     | user-verified   |                 |
+-----------------+-----------------+-----------------+-----------------+

The site CA signs Tier 1 client certificates and must not sign SSH host
keys or any SSH-related material. The manufacturer CA (pre-loaded in the
auth service) verifies Tier 2 attestation chains independently.

If the SSH first factor must be a conventional user key (not an OpenSSH
``sk-`` type FIDO2 key) but the private key is to live on an HSM or
smart card, it must be imported onto the token (not provisioned like an
``sk-`` key handle): the private key and associated certificate material
are supplied as PKCS#12, accompanied by a self-signed X.509v3
certificate, so the token exposes a normal certificate-bound key object
to PKCS#11 and host tooling. That requirement is orthogonal to Tier 2
attestation for the OOB step; it applies whenever a non-``sk-`` SSH key
is hosted on hardware that only exposes keys through X.509-shaped
PKCS#11 objects. On that path Ed25519 is not a viable choice. Although
OpenSSH widely supports Ed25519 for keys on disk, there is no broadly
implemented X.509v3 and PKCS#12 packaging for Ed25519 comparable to
ECDSA or RSA in typical HSM and PIV stacks, so the imported bundle
cannot be formed in the same way. Operators who rely on PKCS#12 plus a
self-signed X.509v3 certificate for an HSM-resident SSH key must
therefore use ECDSA (for example P-256) or RSA for that key.


Protocol flow — shared steps
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Prerequisites (one-time setup)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The auth service holds: its TLS certificate and key; the site CA (Tier
1); the manufacturer attestation CA roots (Tier 2); the LDAP directory
(enrolment registry, see Appendix B); and ``server_secret``. The client
holds: its SSH key; its Tier 1 or Tier 2 credential; and the auth
service CA for server verification.

SSH transport and first-factor authentication
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The SSH transport layer (RFC 4253) completes. Session identifier ``H``
is derived from the key exchange hash, unique to this connection. The
client proves possession of its SSH private key via ``publickey`` auth.
The server returns partial success.

Unified prompt issuance
^^^^^^^^^^^^^^^^^^^^^^^

The ``pam_oob_auth`` module:

1. Generates a 256-bit CSPRNG token ``T``.
2. Computes ``B = HMAC-SHA256(server_secret, H || username)``.
3. Looks up the user’s policy tier.
4. Stores ``(T, B, username, tier, expiry = now + 30s)`` in the token
   table.
5. Issues the unified ``keyboard-interactive`` prompt containing the OOB
   URL with ``?policy=<tier>``.

Client response and path selection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The client submits either a TOTP code (in-band path) or an empty string
(OOB path). The automated client detects the ``OOB-AUTH`` line in the
prompt, submits empty immediately, and begins OOB redemption in parallel
without waiting for user input.

OOB redemption (tier-specific)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The appropriate client component redeems the token at the auth service
endpoint matching the tier encoded in the URL.

SSH session completes
^^^^^^^^^^^^^^^^^^^^^

The auth service signals the PAM module via the internal Unix socket.
The module returns ``PAM_SUCCESS``. The SSH daemon sends
``SSH_MSG_USERAUTH_SUCCESS`` and opens the connection channel.


Tier 1 redemption — X.509 mTLS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``ssh-oob-agent`` opens a mutual TLS connection to the auth service,
presenting its X.509 client certificate. Both sides authenticate in the
TLS handshake.

.. code:: http

   POST /v1/ssh-auth/6a3f9c1e8d2b4f7a0e5c2d9b3f1a8e4d HTTP/1.1
   Host: auth.example.com
   Content-Type: application/json

   {
     "session_binding": "<B>",
     "timestamp":       "<ISO-8601 UTC>",
     "nonce":           "<128-bit random>"
   }

The auth service verifies token existence and TTL, session binding
``B``, the client certificate chain and OCSP status, that the Subject DN
is registered for the username, and that the tier matches ``tier1``. It
then marks token ``T`` as consumed and returns the outcome to PAM.


Tier 2 redemption — HSM attestation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Manufacturer attestation of the card and its keys is a statement about
hardware and key *objects*: serial numbers, CA chains, and attributes
such as ``CKA_EXTRACTABLE``. That has limited operational value on its
own. It becomes security-relevant when the deployment uses it to certify
that the user’s operational private key *inside* the token—not merely a
public representation of that key—is the one that performs a concrete
*signing action* the verifier insists on. Here the action is authorising
release of the second factor for this SSH session: the portal nonce
(together with session binding) is the payload the user’s key must sign
with ``C_Sign`` while the session is live. Attestation data then
explains *which* on-token key performed that action and what the
manufacturer claims about it; the action itself is proved by the
signature, not by attesting the bare key in isolation.

Tier 2 is therefore an online protocol: possession of the enrolled
hardware is tied to an authorisation decision only when the attested
signing slot performs ``C_Sign`` over that verifier-chosen material.
Exporting or uploading attestation certificates, or reading out the
attested public key, is bookkeeping only; anyone with a file copy could
replay those blobs without holding the token. The security event is the
signature produced inside the HSM for that specific operation. On the
client, that signature must not be attempted until attestation has been
read from the USB-bound token session described above; otherwise the
user interface cannot honestly claim “this token, then this action.”

The ``ssh-hsm-client.jar``:

1. Selects and holds the correct USB token (exclusive PKCS#11 slot /
   reader binding via PC/SC or the vendor’s USB stack) so later
   ``C_Sign`` calls cannot silently attach to a different inserted
   device.

2. Unlocks the HSM: prompts for PIN; calls ``KeyStore.load(null, pin)``
   via ``SunPKCS11``. Without correct PIN, ``C_Login`` returns
   ``CKR_USER_NOT_LOGGED_IN``.

3. Reads the key attestation certificate and device attestation
   certificate chain from that same session before any authorisation
   signing. If the chain is missing, malformed, or does not match the
   reader the user selected, the client aborts and must not proceed to
   ``C_Sign`` for the OOB step. This ordering enforces “attestation
   first, then signing for authentication”.

4. Fetches a sub-nonce from the portal:
   ``GET /v1/ssh-auth-hsm/{T}/challenge`` →
   ``{ "nonce": "<256-bit base64url>" }``

5. Signs that sub-nonce inside the HSM with the private key bound to the
   key attestation certificate (the attested signing slot), via
   ``Signature.getInstance(...)`` and ``C_Sign``. The private key never
   leaves hardware. That ``C_Sign`` is the cryptographic act being
   relied on: it is the user’s key inside the token performing the
   action the portal required. Without a valid signature over that
   action, the portal must reject the request regardless of what
   certificates were attached.

6. Posts the bundle to ``/v1/ssh-auth-hsm/{T}`` over HTTPS (no client
   cert required — the attested signing operation plus chain identifies
   the device):

.. code:: json

   {
     "session_binding":    "<B>",
     "nonce_signature":    "<base64url ECDSA signature over sub-nonce from the PKCS#11 signing step>",
     "key_attestation":    "<base64 DER>",
     "device_attestation": "<base64 DER>",
     "identity_cert":      "<base64 DER site-CA user certificate>",
     "timestamp":          "<ISO-8601 UTC>",
     "request_nonce":      "<128-bit random>"
   }

The portal verifies, in order: token existence and TTL; session binding;
that ``nonce_signature`` verifies under the SubjectPublicKeyInfo from
``key_attestation`` (the attested hardware key, not merely a public key
copied elsewhere); the attestation chain to the manufacturer root and
``CKA_EXTRACTABLE = FALSE`` in the attestation extension; the LDAP
directory (device serial number and key fingerprint must match an
``hsmTokenHolder`` entry for this username whose ``hsmTokenStatus``
value uses the ``ACTIVE`` suffix, see Appendix B); and the user identity
certificate (site CA, OCSP). It must also enforce the same-token binding
described in *Binding attestation to the SSH first factor*. It then
marks token ``T`` as consumed and returns the outcome to PAM.

Binding attestation to the SSH first factor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A valid attestation and ``nonce_signature`` prove that an on-token key
performed the OOB signing action (nonce authorisation). They do not, on
their own, prove that the SSH ``publickey`` authentication which already
succeeded for this session used a private key that lives on that same
token, nor that the SSH authentication *action* and the OOB *action*
were performed by the same on-token key. Without an extra check, a user
could present a disk-backed SSH key for the first factor while redeeming
Tier 2 with a different USB token that is also enrolled to their account
(or even a colleague’s token, if enrolment were sloppy), and the
protocol would still see two “successful” cryptographic objects keyed
only by username.

Closing that gap requires an explicit binding policy implemented in
software:

-  Record the public key (or its fingerprint) that ``sshd`` actually
   accepted for the first factor for this session identifier ``H`` and
   username. That record must reach the auth service together with ``B``
   (for example ``pam_oob_auth`` passes it when minting ``T``, or the
   auth service reads it from a short-lived side table keyed by ``T``).

-  Require equality, modulo your documented rules, between that
   first-factor fingerprint and the attested key: either the
   SubjectPublicKeyInfo inside ``key_attestation`` after successful
   ``nonce_signature`` verification, or the material stored in LDAP
   ``sshPublicKey`` for that user if and only if policy defines that
   field as the canonical first-factor key for Tier 2 users. If the
   fingerprints disagree, reject redemption with ``403 Forbidden`` even
   when the attestation chain and signature over the nonce are valid.

-  If the site ``identity_cert`` in the JSON bundle is not the same key
   as the attested signing key, presenting the certificate is not
   evidence that the TLS or SSH layer used that certificate’s private
   key on the same hardware; treat it as naming and policy context only
   unless you also require a signature under that certificate’s key over
   the same nonce (or a digest that includes it) inside the same PKCS#11
   session on the token.

The tightest operational pattern is one physical token, one PKCS#11 slot
(or a small set of slots) for both SSH authentication and attestation,
and strict fingerprint equality between what ``sshd`` accepted and what
the attestation describes. Anything looser must be named as a residual
risk in the deployment’s threat model.


Session binding — preventing token relay
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The tag ``B = HMAC-SHA256(server_secret, H || username)`` ties every
token to the SSH session that issued it. A MITM who relays the OOB URL
has a different ``H'``, producing ``B' ≠ B``. The auth service rejects
``B'``. ``H`` is never transmitted over the OOB channel; HMAC is one-way
so observing ``B`` reveals neither ``H`` nor ``server_secret``.


Security properties
~~~~~~~~~~~~~~~~~~~

+-----------------------------------+-----------------------------------+
| Property                          | How it is achieved                |
+===================================+===================================+
| Channel independence              | OOB redemption is a separate      |
|                                   | HTTPS connection; SSH carries     |
|                                   | only an opaque URL and a blank    |
|                                   | field                             |
+-----------------------------------+-----------------------------------+
| Key independence                  | SSH key and OOB credential are    |
|                                   | distinct key pairs on distinct    |
|                                   | systems                           |
+-----------------------------------+-----------------------------------+
| Automation compatibility          | Agent detects ``OOB-AUTH`` prefix |
|                                   | and submits empty response; no    |
|                                   | SSH client modification needed    |
+-----------------------------------+-----------------------------------+
| Human fallback                    | TOTP code path remains in the     |
|                                   | same prompt; OOB outage does not  |
|                                   | lock users out                    |
+-----------------------------------+-----------------------------------+
| Replay resistance                 | Token is single-use, consumed on  |
|                                   | first verified redemption; TTL 30 |
|                                   | s                                 |
+-----------------------------------+-----------------------------------+
| Session binding                   | HMAC tag ``B`` ties token to SSH  |
|                                   | session ``H``; relay attacks      |
|                                   | defeated                          |
+-----------------------------------+-----------------------------------+
| Policy enforcement                | Tier enforced server-side; client |
|                                   | cannot self-upgrade Tier 1 to     |
|                                   | Tier 2                            |
+-----------------------------------+-----------------------------------+
| Tier 2: non-extractable key       | ``CKA_EXTRACTABLE = FALSE``       |
|                                   | verified in attestation;          |
|                                   | manufacturer root in trust store  |
+-----------------------------------+-----------------------------------+
| Tier 2: user presence             | PIN required each session;        |
|                                   | enforced by hardware              |
|                                   | (``CKR_USER_NOT_LOGGED_IN``)      |
+-----------------------------------+-----------------------------------+
| Revocation                        | Tier 1: OCSP at each redemption.  |
|                                   | Tier 2: OCSP on identity cert +   |
|                                   | LDAP ``hsmTokenStatus`` set to    |
|                                   | the ``REVOKED`` suffix            |
|                                   | (*Lifecycle operations* in        |
|                                   | Appendix B)                       |
+-----------------------------------+-----------------------------------+


Attack surface analysis
~~~~~~~~~~~~~~~~~~~~~~~

Attacker holds SSH private key only. Passes first factor; cannot redeem
OOB token — no X.509 cert (Tier 1) or HSM + PIN (Tier 2). Token expires
in 30 s.

Attacker holds X.509 client certificate and key only (Tier 1). Cannot
pass first factor without SSH key; cannot produce session binding ``B``
without ``H``.

Attacker intercepts the OOB URL. Cannot redeem without X.509 private key
or HSM. Session binding ``B`` also rejects any redemption from a
different SSH session.

Attacker operates a rogue SSH server (MITM). Relays the OOB URL. Their
``H' ≠ H``, so ``B' ≠ B``. Auth service rejects.

Auth service is compromised. Critical single point of failure: attacker
can forge ``B`` values and issue false ``PAM_SUCCESS``. Mitigations:
isolated deployment, loopback-only internal socket, HSM-backed auth
service key, ``server_secret`` rotation, full audit log.


Comparison with existing approaches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+---------+---------+---------+---------+---------+---------+---------+
| A       | Channel | Key     | Aut     | Re      | Sessio  | Har     |
| pproach | inde    | inde    | omation | play-re | n-bound | dware-v |
|         | pendent | pendent |         | sistant |         | erified |
+=========+=========+=========+=========+=========+=========+=========+
| S       | No      | No      | No      | Partial | No      | No      |
| oftware |         |         |         |         |         |         |
| TOTP    |         |         |         |         |         |         |
| (PAM)   |         |         |         |         |         |         |
+---------+---------+---------+---------+---------+---------+---------+
| H       | No      | Yes     | No      | Partial | No      | No      |
| ardware |         |         |         |         |         |         |
| TOTP    |         |         |         |         |         |         |
| token   |         |         |         |         |         |         |
+---------+---------+---------+---------+---------+---------+---------+
| FIDO2   | No      | No      | No      | Yes     | No      | Partial |
| sk-SSH  |         |         |         |         |         |         |
| +       |         |         |         |         |         |         |
| same    |         |         |         |         |         |         |
| -device |         |         |         |         |         |         |
| 2FA     |         |         |         |         |         |         |
+---------+---------+---------+---------+---------+---------+---------+
| Tier 1  | Yes     | Yes     | Yes     | Yes     | Yes     | No      |
| (X.509  |         |         |         |         |         |         |
| mTLS)   |         |         |         |         |         |         |
+---------+---------+---------+---------+---------+---------+---------+
| Tier 2  | Yes     | Yes     | Human   | Yes     | Yes     | Yes     |
| (HSM    |         |         | PIN     |         |         |         |
| attes   |         |         |         |         |         |         |
| tation) |         |         |         |         |         |         |
+---------+---------+---------+---------+---------+---------+---------+

Summary and next steps
----------------------

The proposed unified OOB protocol resolves every weakness identified in
the preceding weakness analysis. The second factor travels on a
genuinely independent channel (mTLS, not SSH); the credential is a
distinct X.509 key pair unrelated to the SSH key; automation is native
to the design via ``ssh-oob-agent``; tokens are single-use and
session-bound.

Next steps in formalising this proposal:

-  Define the internal PAM-to-auth-service socket protocol.
-  Specify the full X.509 certificate profile (EKU, SAN, key usage
   constraints).
-  Define the client agent discovery mechanism for the ``OOB-AUTH``
   prompt prefix.
-  Address the auth service HA (high-availability) and persistence
   requirements for the token table under load.
-  Produce a reference implementation and threat model document.


References
----------

-  NIST Special Publication 800-63B, *Digital Identity Guidelines:
   Authentication and Lifecycle Management*, June 2017 (superseded by SP
   800-63B-4, August 2025).
   https://pages.nist.gov/800-63-3/sp800-63b.html

-  NIST Special Publication 800-63B-4, *Digital Identity Guidelines:
   Authentication and Lifecycle Management*, 2024.
   https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-63B-4.pdf

-  Yubico, *Securing SSH with FIDO2*.
   https://developers.yubico.com/SSH/Securing_SSH_with_FIDO2.html

-  Yubico, *FIDO2 Passwordless Authentication*.
   https://www.yubico.com/authentication-standards/fido2/

-  OpenSSH 8.2 Release Notes, *FIDO/U2F Support*.
   https://www.openssh.com/txt/release-8.2

-  M. Vorburger, *SSH Key type ed25519-sk (and ecdsa-sk)*, GitHub Notes.
   https://github.com/vorburger/vorburger.ch-Notes/blob/develop/security/ed25519-sk.md

-  Cryptsus, *How to configure OpenSSH with YubiKey Security Keys U2F
   OTP Authentication*.
   https://cryptsus.com/blog/how-to-configure-openssh-with-yubikey-security-keys-u2f-otp-authentication-ed25519-sk-ecdsa-sk-on-ubuntu-18.04.html

-  M. Jones, J. Bradley, N. Sakimura, *TOTP: Time-Based One-Time
   Password Algorithm*, RFC 6238, IETF, May 2011.
   https://www.rfc-editor.org/rfc/rfc6238

-  T. Ylonen, C. Lonvick, *The Secure Shell (SSH) Authentication
   Protocol*, RFC 4252, IETF, January 2006.
   https://www.rfc-editor.org/rfc/rfc4252

-  T. Ylonen, C. Lonvick, *The Secure Shell (SSH) Transport Layer
   Protocol*, RFC 4253, IETF, January 2006.
   https://www.rfc-editor.org/rfc/rfc4253

-  F. Cusack, M. Forssen, *Generic Message Exchange Authentication for
   the Secure Shell Protocol (SSH)*, RFC 4256, IETF, January 2006.
   https://www.rfc-editor.org/rfc/rfc4256

-  CERN Computer Security Team, *pam_2fa: Two-Factor Authentication PAM
   module*. https://cern-cert.github.io/pam_2fa/

-  N. Sakimura et al., *Proof Key for Code Exchange by OAuth Public
   Clients*, RFC 7636, IETF, September 2015.
   https://www.rfc-editor.org/rfc/rfc7636

-  B. Campbell, J. Bradley, N. Sakimura, T. Lodderstedt, *OAuth 2.0
   Mutual-TLS Client Authentication and Certificate-Bound Access
   Tokens*, RFC 8705, IETF, February 2020.
   https://www.rfc-editor.org/rfc/rfc8705

-  T. Dierks, E. Rescorla, *The Transport Layer Security (TLS) Protocol
   Version 1.2*, RFC 5246, IETF, August 2008.
   https://www.rfc-editor.org/rfc/rfc5246 (see also RFC 8446 for TLS
   1.3: https://www.rfc-editor.org/rfc/rfc8446)


.. raw:: html

   <!-- appendix-canonical-order: Appendix A (deployment) is the next heading; Appendix B (HSM/LDAP) appears later in this file. -->

Appendix A — Deployment reference
---------------------------------

sshd_config: Enforcing policy tiers per group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

   # All users must complete first-factor (pubkey) then keyboard-interactive
   AuthenticationMethods publickey,keyboard-interactive
   UsePAM yes
   KbdInteractiveAuthentication yes

   # Service accounts and CI/CD — Tier 1 (X.509 mTLS agent)
   Match Group automation
       # pam_oob_auth will issue ?policy=tier1 URLs for members of this group
       # ssh-oob-agent running in the pipeline redeems automatically

   # Standard interactive users — Tier 1 or TOTP fallback
   Match Group staff
       # pam_oob_auth issues ?policy=tier1; user may use agent or TOTP code

   # Privileged administrators — Tier 2 (HSM attestation + PIN)
   Match Group privileged-admins
       # pam_oob_auth issues ?policy=tier2; only ssh-hsm-client.jar can redeem
       # TOTP fallback may be disabled for this group in pam_oob_auth config

Policy tier assignment lives in ``pam_oob_auth``\ ’s own configuration
file, not in ``sshd_config``. The ``Match Group`` blocks above are
illustrative; in practice a single ``pam_oob_auth`` instance reads a
policy table keyed on Unix group membership.


Java HSM client — implementation sketch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``ssh-hsm-client.jar`` is a self-contained Java 11+ application
using only the standard library and a single HTTP client dependency. The
PKCS#11 bridge uses the built-in ``SunPKCS11`` provider; no third-party
HSM SDK is required at the Java layer. Production code should open PC/SC
or a vendor USB handle first, pin the SunPKCS11 configuration to that
reader or serial, and refuse to sign until
``getCertificateChain(ATTEST_ALIAS)`` (or the vendor equivalent)
succeeds on that same session—matching the policy in *Tier 2 redemption
— HSM attestation*.

.. code:: java

   // 0. Bind PKCS#11 to the intended USB token (PC/SC reader name, serial, or vendor API)
   //    so all following operations hit one device; no signing before this is stable.

   // 1. Load HSM via SunPKCS11
   Provider p = Security.getProvider("SunPKCS11")
                        .configure(pkcs11ConfigPath);   // name, library, slotListIndex
   Security.insertProviderAt(p, 1);

   KeyStore ks = KeyStore.getInstance("PKCS11", p);
   char[] pin = readPinFromConsole();                    // never stored, cleared after use
   ks.load(null, pin);
   Arrays.fill(pin, '\0');

   // 2. Retrieve OOB URL from SSH prompt (via PTY monitor or wrapper pipe)
   String oobUrl = promptMonitor.awaitOobAuth();         // blocks until OOB-AUTH seen
   String token  = parseToken(oobUrl);                   // extract path component
   String policy = parsePolicy(oobUrl);                  // "tier1" or "tier2"

   // 3. Submit empty TOTP response to SSH (unblocks the PAM wait-loop)
   sshSession.sendKbdInteractiveResponse("");

   if ("tier2".equals(policy)) {
       // 4a. Attestation gate: read chain BEFORE any C_Sign used for OOB authorisation
       Certificate[] chain      = ks.getCertificateChain(ATTEST_ALIAS);
       if (chain == null || chain.length < 2) {
           throw new SecurityException("Attestation material missing on bound token");
       }
       byte[] keyAttestation    = chain[0].getEncoded();
       byte[] deviceAttestation = chain[1].getEncoded();
       byte[] identityCert      = ks.getCertificate(IDENTITY_ALIAS).getEncoded();

       // 4b. Fetch sub-nonce from portal (only after attestation read succeeds)
       byte[] nonce = httpGet(oobUrl + "/challenge");

       // 4c. Sign portal sub-nonce with operational key (same PKCS#11 session as 4a)
       PrivateKey key = (PrivateKey) ks.getKey(SSH_KEY_ALIAS, null);
       Signature sig  = Signature.getInstance("SHA256withECDSA", p);
       sig.initSign(key);
       sig.update(nonce);
       byte[] signature = sig.sign();  // live proof; attestation alone is not

       // 4d. Build and post attestation payload
       String body = buildAttestationJson(
           sessionBinding, signature, keyAttestation,
           deviceAttestation, identityCert);
       httpPost(oobUrl, body);                           // plain HTTPS, no client cert

   } else {
       // 4e. Tier 1: mTLS redemption using separate keystore
       KeyStore tlsKs = loadTier1Keystore();             // file or secrets-manager
       httpPostMtls(oobUrl, sessionBinding, tlsKs);
   }
   // SSH session unblocks upon PAM_SUCCESS signal from auth service

Key points: attestation certificates are read in branch ``// 4a`` before
branch ``// 4c`` calls ``sig.sign()``; that ordering matches
“attestation present on this USB-bound token, then signing for
authentication.” The ``sig.sign()`` call is a PKCS#11 ``C_Sign`` on the
portal nonce inside the attested slot. The JVM holds only an opaque
``PrivateKey`` handle. The PIN character array is zeroed immediately
after ``ks.load()``. The ``sessionBinding`` (``B``) is derived from the
SSH session identifier ``H`` obtained from the SSH wrapper at startup.


Auth service — token table schema (reference)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: sql

   CREATE TABLE oob_tokens (
       token_id     BYTEA        PRIMARY KEY,   -- 32-byte CSPRNG token T (raw)
       binding_tag  BYTEA        NOT NULL,       -- B = HMAC-SHA256(server_secret, H||user)
       username     TEXT         NOT NULL,
       policy_tier  SMALLINT     NOT NULL,       -- 1 or 2
       issued_at    TIMESTAMPTZ  NOT NULL DEFAULT now(),
       expires_at   TIMESTAMPTZ  NOT NULL,       -- issued_at + 30s
       consumed_at  TIMESTAMPTZ  DEFAULT NULL,   -- NULL = not yet redeemed
       redeemer_dn  TEXT         DEFAULT NULL,   -- Subject DN of client cert (Tier 1)
       device_sn    TEXT         DEFAULT NULL    -- HSM serial number (Tier 2)
   );

   CREATE INDEX ON oob_tokens (expires_at);     -- for TTL sweep job

The ``consumed_at`` column is set atomically using
``UPDATE ... WHERE consumed_at IS NULL RETURNING *``; a zero-row result
indicates a replay attempt, which returns ``409 Conflict``. A background
job purges rows where ``expires_at < now() - INTERVAL '5 minutes'`` to
bound table growth.


Supplementary references (Appendix A)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-  RSA Security / OASIS, *PKCS #11: Cryptographic Token Interface
   Standard v2.40*, 2015.
   https://docs.oasis-open.org/pkcs11/pkcs11-base/v2.40/pkcs11-base-v2.40.html

-  Oracle, *Java PKCS#11 Reference Guide* (``SunPKCS11`` provider).
   https://docs.oracle.com/en/java/javase/11/security/pkcs11-reference-guide1.html

-  Yubico, *PIV Attestation*.
   https://developers.yubico.com/PIV/Introduction/PIV_attestation.html

-  Securosys, *Key Attestation — HSM Root Certificate and PKI Trust*.
   https://www.securosys.com/en/key-attestation-by-securosys

-  Connect2id, *Signing JWTs with a Smart Card or HSM*.
   https://connect2id.com/products/nimbus-jose-jwt/examples/pkcs11

-  NIST SP 800-63B-4, *Authentication and Lifecycle Management*, §3.2.5
   (Multi-Factor Cryptographic Devices), 2024.
   https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-63B-4.pdf


Appendix B — HSM card registration, LDAP schema, and enrolment lifecycle
------------------------------------------------------------------------

Design rationale
~~~~~~~~~~~~~~~~

The auth service’s enrolment registry, referenced in *Tier 2 redemption
— HSM attestation*, must be a first-class directory citizen rather than
a flat file or a bespoke database table. Two properties drive this
requirement:

The first property is centralised identity authority. In any
organisation that already operates an LDAP directory (OpenLDAP, 389-DS,
Active Directory with RFC 4519-compatible schema), the user’s DN is the
canonical identity anchor. Binding the HSM attestation to the user’s
LDAP entry ensures that existing provisioning, de-provisioning, access
control, and audit workflows cover the HSM token automatically. Revoking
a user account or suspending it in LDAP immediately stops new SSH
sessions from completing the Tier 2 OOB factor, because the auth
service’s LDAP lookup returns no enrolled token for that DN whose
``hsmTokenStatus`` value still carries the ``ACTIVE`` suffix.

The second property is multi-token support. A user may legitimately hold
more than one enrolled HSM — a primary device and a backup, or separate
tokens for different workstations. The LDAP attribute carrying the HSM
identifier must be multi-valued so that each physical device has its own
entry, independently revocable without affecting the others.


LDAP schema extension
~~~~~~~~~~~~~~~~~~~~~

The schema adds one auxiliary object class (``hsmTokenHolder``) and four
attribute types. All OIDs are under a private enterprise arc; the
placeholder PEN ``99999`` below must be replaced with the organisation’s
IANA-assigned Private Enterprise Number.

.. code:: ldif

   # ---------------------------------------------------------------
   # Attribute type definitions
   # OID arc: 1.3.6.1.4.1.99999.1.x  (replace 99999 with real PEN)
   # ---------------------------------------------------------------

   # hsmSerialNumber — device serial number, from the CN of the
   # device attestation certificate issued by the manufacturer.
   # Multi-valued: one value per enrolled device.
   attributeTypes: ( 1.3.6.1.4.1.99999.1.1
     NAME 'hsmSerialNumber'
     DESC 'Manufacturer serial number of an enrolled HSM token'
     EQUALITY caseExactMatch
     SUBSTR caseExactSubstringsMatch
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.26
     X-ORIGIN 'ssh-oob-auth schema v1' )

   # hsmKeyFingerprint — SHA-256 fingerprint (hex, lowercase) of the
   # SubjectPublicKeyInfo DER from the key attestation certificate.
   # Binds a specific non-exportable key to a specific device serial.
   # Multi-valued: one value per enrolled device.
   attributeTypes: ( 1.3.6.1.4.1.99999.1.2
     NAME 'hsmKeyFingerprint'
     DESC 'SHA-256 fingerprint of the attested SSH key SubjectPublicKeyInfo'
     EQUALITY caseExactMatch
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.26
     X-ORIGIN 'ssh-oob-auth schema v1' )

   # hsmTokenStatus — per-token lifecycle state.
   # Structured value: "<serialNumber>:<status>" where status is one of
   # ACTIVE, SUSPENDED, REVOKED.
   # Multi-valued: one value per enrolled device, parallel to hsmSerialNumber.
   attributeTypes: ( 1.3.6.1.4.1.99999.1.3
     NAME 'hsmTokenStatus'
     DESC 'Lifecycle state of an enrolled HSM token (serialNumber:STATUS)'
     EQUALITY caseExactMatch
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.26
     X-ORIGIN 'ssh-oob-auth schema v1' )

   # hsmEnrolledAt — RFC 3339 timestamp of initial enrolment, per token.
   # Multi-valued: one value per enrolled device.
   attributeTypes: ( 1.3.6.1.4.1.99999.1.4
     NAME 'hsmEnrolledAt'
     DESC 'ISO 8601 UTC timestamp of HSM token enrolment'
     EQUALITY generalizedTimeMatch
     ORDERING generalizedTimeOrderingMatch
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.24
     X-ORIGIN 'ssh-oob-auth schema v1' )

   # hsmKeyAttestation — full DER of the key attestation certificate,
   # base64-encoded. Stored for offline re-verification without
   # requiring the physical device to be present.
   # Multi-valued: one value per enrolled device.
   attributeTypes: ( 1.3.6.1.4.1.99999.1.5
     NAME 'hsmKeyAttestation'
     DESC 'Base64-encoded DER of the key attestation certificate'
     EQUALITY octetStringMatch
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.40
     X-ORIGIN 'ssh-oob-auth schema v1' )

   # ---------------------------------------------------------------
   # AUXILIARY object class
   # ---------------------------------------------------------------

   objectClasses: ( 1.3.6.1.4.1.99999.2.1
     NAME 'hsmTokenHolder'
     DESC 'Auxiliary class for users with enrolled HSM tokens'
     SUP top
     AUXILIARY
     MAY ( hsmSerialNumber $ hsmKeyFingerprint $
           hsmTokenStatus $ hsmEnrolledAt $ hsmKeyAttestation )
     X-ORIGIN 'ssh-oob-auth schema v1' )

All five attributes are optional on the object class so that it can be
added to a user entry before enrolment is complete, and so that users
without an HSM token are not forced to carry empty mandatory attributes.

The ``hsmTokenStatus`` attribute uses a composite
``<serialNumber>:<STATUS>`` value rather than a separate entry or
attribute, which keeps all token state on the user object and avoids the
need for a subordinate entry tree. The auth service composes the lookup
key at query time and enforces referential consistency at write time.


Resulting LDAP entry structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Person entries in the directory we operate follow a fixed structural
pattern: ``inetOrgPerson``, ``organizationalPerson``, and ``person``
(with ``top``) for identity; ``posixAccount`` for Unix identifiers and
shell; and ``ldapPublicKey`` so the first-factor SSH public key is
carried on the entry itself (OpenSSH LDAP public-key object class,
commonly loaded into OpenLDAP). The example below shows that baseline
shape on a real DN layout (``ou=People``, ``dc=sofiatech,dc=bg``). The
``userPassword`` value is only illustrative (synthetic base64);
production entries use your real hash. The ``sshPublicKey`` line is
shortened for the page; the live attribute holds the full single-line
``authorized_keys``-style material.

After Tier 2 enrolment, the same entry also carries the auxiliary
``hsmTokenHolder`` object class and the multi-valued HSM attributes at
the end. The listing below therefore shows one user with two enrolled
tokens (primary and backup).

.. code:: ldif

   dn: uid=vkolev,ou=People,dc=sofiatech,dc=bg
   objectClass: inetOrgPerson
   objectClass: ldapPublicKey
   objectClass: organizationalPerson
   objectClass: person
   objectClass: posixAccount
   objectClass: top
   objectClass: hsmTokenHolder
   cn: Veselin Kolev
   gidNumber: 2001
   homeDirectory: /home/vkolev
   sn: Kolev
   uid: vkolev
   uidNumber: 2001
   description: vkolev01
   givenName: Veselin
   loginShell: /bin/bash
   mail: v.kolev@discoverer.bg
   userPassword:: e1BCS0RGMl9TSEEyNTZ9ZXhhbXBsZVN5bnMoZXRpY10h13NwbBJkSGFzaEAubHkK
   sshPublicKey: ecdsa-sha2-nistp384 AAAAE2VjZHNhLXNoYTItbmlzdHAzODQAAAAIbmlzdHAzODQC00BhBGExampleOnlySyntheticKeyMaterialShortened= 2021081902

   # First HSM token — primary workstation device
   hsmSerialNumber:    28345819
   hsmKeyFingerprint:  a3f0c1d2e4b5...  (SHA-256 hex, 64 chars)
   hsmTokenStatus:     28345819:ACTIVE
   hsmEnrolledAt:      20250310T091500Z
   hsmKeyAttestation:  MIICxTCCAb... (base64 DER, truncated)

   # Second HSM token — backup device
   hsmSerialNumber:    28345820
   hsmKeyFingerprint:  91b2a7f3e0c4...
   hsmTokenStatus:     28345820:ACTIVE
   hsmEnrolledAt:      20250310T093000Z
   hsmKeyAttestation:  MIICxUCCAb...

The auth service looks up the user’s DN, reads all ``hsmSerialNumber`` /
``hsmKeyFingerprint`` / ``hsmTokenStatus`` triples, and checks whether
the serial number and key fingerprint from the attestation proof match
any row whose status suffix is ``ACTIVE``. If they do, redemption
proceeds. If the suffix is ``SUSPENDED`` or ``REVOKED``, redemption is
rejected with ``403 Forbidden`` even if the attestation chain is
cryptographically valid.


Enrolment workflow
~~~~~~~~~~~~~~~~~~

Enrolment is a separate, one-time administrative process that must
complete before a user can use Tier 2 SSH authentication. It is not part
of the SSH login flow.

Participants
^^^^^^^^^^^^

-  User — the person whose HSM token is being enrolled.
-  Enrolment portal — a web or CLI interface where the user submits
   their attestation artefacts. May be self-service (user-initiated) or
   admin-mediated.
-  Enrolment authority — an administrator or automated service that
   validates the submission and writes to LDAP. In a self-service flow,
   the enrolment portal acts as the enrolment authority after automated
   validation.

Key generation on the HSM (user, one-time per device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The user generates the SSH key directly on the HSM so that
``CKA_EXTRACTABLE = FALSE`` is set at key creation time. This cannot be
retrofitted to an existing key — a key that was ever exportable cannot
be attested as hardware-bound. If policy instead requires loading a
conventional SSH private key (not an ``sk-`` type key) onto the token,
the key is normally imported as PKCS#12 accompanied by a self-signed
X.509v3 certificate; Ed25519 cannot be used on that import path, and
ECDSA or RSA must be chosen so the key fits the certificate profile the
token exposes.

.. code:: sh

   # Example: YubiKey PIV slot 9a via pkcs11-tool
   pkcs11-tool --module /usr/lib/libykcs11.so   --login --pin $PIN   --keypairgen --key-type EC:prime256v1   --id 01 --label "ssh-oob-key"

Generate key attestation artefacts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Enrolment must also be an online attestation step, not a dead export of
files. The enrolment portal therefore issues a short-lived challenge;
the user (or ``ssh-hsm-client.jar --enrol``) must cause the private key
in the attested slot to sign that challenge inside the HSM. The portal
rejects enrolment if the signature does not verify against the public
key embedded in the supplied key attestation certificate. Alongside that
proof, the client collects: - The key attestation certificate: signed by
the device’s attestation key, containing the attested public key and
asserting ``CKA_EXTRACTABLE = FALSE``. - The device attestation
certificate: the device’s own attestation certificate, signed by the
manufacturer’s intermediate CA.

Those certificates explain *which* key was used and *what* the
manufacturer claims about it; they are not, alone, evidence that the
prover holds the token now. Their operational value is downstream: they
let the auth service interpret each future ``C_Sign`` on a portal or
enrolment challenge as having been performed by the user’s on-token key
under known hardware rules, rather than treating attestation as an end
in itself.

For YubiKey PIV:

.. code:: sh

   yubico-piv-tool --action=attest --slot=9a > key_attest.der
   yubico-piv-tool --action=read-certificate --slot=f9 > device_attest.der

For PKCS#11 HSMs with a vendor attestation extension, the equivalent
calls are vendor-specific; ``ssh-hsm-client.jar --enrol`` abstracts them
via the PKCS#11 bridge.

Submit to enrolment portal
^^^^^^^^^^^^^^^^^^^^^^^^^^

The user submits to the enrolment portal over authenticated HTTPS (the
user must authenticate to the portal via their existing credentials —
this is not yet 2FA). The body includes a fresh signature from the
attested slot over the portal-issued ``enrolment_challenge``, so
possession is proved at enrolment time in the same way as at each Tier 2
login (*Tier 2 redemption — HSM attestation*).

.. code:: http

   POST /v1/enrol/hsm HTTP/1.1
   Host: pki-portal.example.com
   Content-Type: application/json
   Authorization: Bearer <user-session-token>

   {
     "enrolment_challenge": "<base64url random, issued by portal>",
     "enrolment_signature": "<base64url ECDSA signature over enrolment_challenge by attested key>",
     "key_attestation":    "<base64 DER>",
     "device_attestation": "<base64 DER>",
     "label":              "Primary YubiKey 5C — office workstation"
   }

Portal validation (enrolment)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The enrolment portal performs the same logical checks as the auth
service at login time, but without the SSH session token ``T``. It
treats ``enrolment_signature`` as mandatory evidence of live
attestation; certificate files alone are insufficient.

1. Verify ``enrolment_signature`` against the SubjectPublicKeyInfo in
   ``key_attestation``, using the same ``enrolment_challenge`` the
   portal just issued.
2. Verify ``key_attestation`` is signed by ``device_attestation``.
3. Verify ``device_attestation`` chains to the pre-loaded manufacturer
   root CA.
4. Extract and verify ``CKA_EXTRACTABLE = FALSE`` from the attestation
   extension.
5. Extract the device serial number from ``device_attestation`` Subject
   CN.
6. Extract the attested public key and compute its SHA-256 fingerprint.
7. Check that this serial number is not already enrolled for a
   *different* user (prevents a compromised device being re-enrolled
   under a second identity).
8. Check that this serial number is not already enrolled for this user
   with the ``ACTIVE`` suffix (prevents duplicate concurrent enrolment
   for the same device). If the only existing row for this serial under
   this user carries the ``REVOKED`` suffix, a new enrolment may proceed
   according to policy.

LDAP write (enrolment)
^^^^^^^^^^^^^^^^^^^^^^

On successful validation, the enrolment authority performs an LDAP
modify:

.. code:: ldif

   dn: uid=vkolev,ou=People,dc=sofiatech,dc=bg
   changetype: modify
   add: objectClass
   objectClass: hsmTokenHolder
   -
   add: hsmSerialNumber
   hsmSerialNumber: 28345819
   -
   add: hsmKeyFingerprint
   hsmKeyFingerprint: a3f0c1d2e4b5...
   -
   add: hsmTokenStatus
   hsmTokenStatus: 28345819:ACTIVE
   -
   add: hsmEnrolledAt
   hsmEnrolledAt: 20250310T091500Z
   -
   add: hsmKeyAttestation
   hsmKeyAttestation: MIICxTCCAb...

The LDAP write uses a service account with write access scoped only to
the ``hsmTokenHolder`` auxiliary attributes — it must not have write
access to ``uid``, ``mail``, group memberships, or any other identity
attribute.

Confirmation (enrolment)
^^^^^^^^^^^^^^^^^^^^^^^^

The enrolment portal notifies the user (e-mail or in-portal message)
that their HSM token has been registered. The auth service picks up the
new LDAP entry on the next lookup (no cache invalidation needed if the
auth service queries LDAP live, or with a TTL of ≤ 60 s if caching is
used).


Lifecycle operations
~~~~~~~~~~~~~~~~~~~~

Suspending a token
^^^^^^^^^^^^^^^^^^

Used for temporary loss of the device (for example when the user travels
without the token). A token whose ``hsmTokenStatus`` value carries the
``SUSPENDED`` suffix cannot complete Tier 2 OOB authentication; the user
falls back to Tier 1 when policy permits, or must contact an
administrator.

.. code:: ldif

   dn: uid=vkolev,ou=People,dc=sofiatech,dc=bg
   changetype: modify
   delete: hsmTokenStatus
   hsmTokenStatus: 28345819:ACTIVE
   -
   add: hsmTokenStatus
   hsmTokenStatus: 28345819:SUSPENDED

The replace is done as a delete-then-add of the specific value to avoid
clobbering the other token’s status attribute value.

Revoking a token (permanent)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Used for a lost, stolen, or physically destroyed device. Revocation is
permanent; the serial number is retained in LDAP for audit but the
status prevents any future use. The ``hsmKeyAttestation`` is kept to
support forensic investigation.

.. code:: ldif

   dn: uid=vkolev,ou=People,dc=sofiatech,dc=bg
   changetype: modify
   delete: hsmTokenStatus
   hsmTokenStatus: 28345819:ACTIVE
   -
   add: hsmTokenStatus
   hsmTokenStatus: 28345819:REVOKED

After revocation, the user must enrol a new device (the enrolment
procedure from key generation on the HSM through confirmation) before
Tier 2 authentication becomes available again.

Re-enrolment after revocation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The old serial number’s entry remains with the ``REVOKED`` suffix as an
audit trail. A new enrolment for a replacement device adds a new set of
attribute values, keyed by the new serial number, alongside the row that
still records ``REVOKED`` for the retired device.

User off-boarding
^^^^^^^^^^^^^^^^^

When a user’s account is deprovisioned, all HSM token attribute values
are removed as part of the standard de-provisioning script. The
``hsmTokenHolder`` object class is removed if no other
``hsmTokenHolder`` attributes remain.


Auth service LDAP lookup (runtime)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

During Tier 2 token redemption, once the attestation-backed OOB request
is accepted, the auth service performs:

::

   LDAP search:
     base:   ou=People,dc=sofiatech,dc=bg
     scope:  subtree
     filter: (&(objectClass=hsmTokenHolder)(uid=<username>))
     attrs:  hsmSerialNumber, hsmKeyFingerprint, hsmTokenStatus

Replace ``base`` with your own suffix when the DIT differs; the filter
assumes the user entry already includes ``objectClass: hsmTokenHolder``
alongside the baseline person / ``ldapPublicKey`` / ``posixAccount``
classes from *Resulting LDAP entry structure* (Appendix B).

It then evaluates every
``(hsmSerialNumber, hsmKeyFingerprint, hsmTokenStatus)`` triple against
the attestation proof submitted in the redemption request:

::

   for each enrolled token in LDAP result:
       if hsmTokenStatus != "<serial>:ACTIVE"  → skip
       if hsmSerialNumber != attested_serial    → skip
       if hsmKeyFingerprint != attested_fp      → skip
       → MATCH: proceed to verify nonce signature
   if no match found → reject with 403

The ``attested_serial`` is extracted from the Subject CN of
``device_attestation``. The ``attested_fp`` is computed as
``SHA-256(SubjectPublicKeyInfo DER)`` from ``key_attestation``. Those
values only tell the auth service which LDAP rows are candidates for
this redemption; they are not a substitute for the live
``nonce_signature`` check in the same request. Possession is always
decided by the fresh signature over the portal nonce performed in the
attested slot during *Tier 2 redemption — HSM attestation*.


Security notes
~~~~~~~~~~~~~~

Serial number collision. Two different manufacturer devices should never
share a serial number, but the schema enforces uniqueness across users
by checking at enrolment time (portal validation step 7: serial number
not already enrolled for another user). This check must be performed
under an LDAP transaction or with compare-and-swap semantics to prevent
a race where two users enrol the same serial simultaneously.

Attestation certificate storage. LDAP retains the last successful key
attestation certificate (and related fields) so operators can audit what
was enrolled and so the auth service can bind serial number and
fingerprint to a user entry. That stored blob is not a standing proof of
possession: it cannot replace an online signature from the
attestation-backed slot. Each Tier 2 login still requires the live
``nonce_signature`` path in *Tier 2 redemption — HSM attestation*. The
certificate contains no private material, but it does carry the public
key and serial metadata, so the attribute should be access-controlled to
the auth service account and administrators only.

LDAP as a single point of trust. Because the auth service’s Tier 2
decision is gated entirely on the LDAP entry, LDAP integrity is
critical. The LDAP service must be protected with the same rigour as the
auth service itself: TLS-only connections (LDAPS or StartTLS), mutual
authentication on the service account bind, LDAP audit logging, and
replication integrity monitoring.


Supplementary references (Appendix B)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-  A. Sciberras (Ed.), *Lightweight Directory Access Protocol (LDAP):
   Schema for User Applications*, RFC 4519, IETF, June 2006.
   https://www.rfc-editor.org/rfc/rfc4519

-  K. Zeilenga, *Lightweight Directory Access Protocol (LDAP): Technical
   Specification Road Map*, RFC 4510, IETF, June 2006.
   https://www.rfc-editor.org/rfc/rfc4510

-  IANA, *Private Enterprise Numbers*.
   https://www.iana.org/assignments/enterprise-numbers

-  Yubico, *PIV Attestation*.
   https://developers.yubico.com/PIV/Introduction/PIV_attestation.html