This is a sketch of a design for the secure channel that we want to have between the Cryptech HSM and the client libraries which talk to it. Work in progress, and not implemented yet because a few of the pieces are still missing.
Design goals and constraints
Basic design goals:
End-to-end between client library and HSM.
Not require yet another presentation layer if we can avoid it (so, reuse XDR if possible, unless we have some strong desire to switch to something else).
Provide end-to-end message integrity between client library and HSM.
Provide end-to-end message confidentiality between client library and HSM. We only need this for a few operations, but between PINs and private keys it would be simpler just to provide it all the time than to be selective.
Provide some form of mutual authentication between client library and HSM. This is tricky, since it requires either configuration (of the other party's authenticator) or leap-of-faith. Leap-of-faith is probably good enough for most of what we really care about (insuring that we're talking to the same dog now as we were earlier).
Not 100% certain we need this at all, but if we're going to leave ourselves wide open to monkey-in-the-middle attacks, there's not much point in having a secure channel at all.
Use boring simple crypto that we already have (or almost have) and which runs fast.
Continue to support multiplexer. Taken together with end-to-end message confidentiality, this may mean two layers of headers: an outer set which the multiplexer is allowed to mutate, then an inner set which is protected. Better, though, would be if the multiplexer can work just by reading the outer headers without modifying anything.
Simple enough that we can implement it easily in HSM, PKCS #11 library, and Python library.
Why not TLS?
We could, of course, Just Use TLS. Might end up doing that, if it turns out to be easier, but TLS is a complicated beast, with far more options than we need, and doesn't provide all of what we want, so a fair amount of the effort would be, not wasted exactly, but a giant step sideways. Absent sane alternatives, I'd just suck it up and do this, with a greatly restricted ciphersuite, but I think we have a better option.
Basic design lifted from "Cryptography Engineering: Design Principles and Practical Applications" (ISBN 978-0-470-47424-2, http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470474246.html), tweaked in places to fit tools we have readily available.
As in the book, there are two layers here: the basic secure channel, moving encrypted-and-authenticated frames back and forth, and a higher level which handles setup, key agreement, and endpoint authentication.
Chapter 7 outlines a simple lower layer using AES-CTR and HMAC-SHA-256. I don't see any particular reason to change any of this, AES-CTR is easy enough. I suppose it might be worth looking into AES-CCM and AES-GCM, but they're somewhat more complicated; section 7.5 ("Alternatives") discusses these briefly, we also know some of the authors.
For key agreement we probably want to use ECDH. We don't quite have that yet, but in theory it's relatively minor work to generalize our existing ECDSA code to cover that too, and, again in theory, it should be possible to generalize our existing ECDSA fast base point multiplier Verilog cores into fast point multiplier cores (sic: limitation of the current cores is that they only compute scalar times the base point, not scalar times an arbitrary point, which is fine for ECDSA but doesn't work for ECDH).
For signature (mutual authentication) we probably want to use ECDSA, again because we have it and it's fast. The more interesting question is the configuration vs leap-of-faith discussion, figuring out under which circumstances we really care about the peer's identity, and figuring out how to store state.
Chapter 14 (key negotiation) of the same book covers the rest of the protocol, substituting ECDH and ECDSA for DH and RSA, respectively. As noted in the text, we could use a shared secret key and a MAC function instead of public key based authentication.
Alternatively, the Station-to-Station protocol described in 4.6.1 of "Guide to Elliptic Curve Cryptography" (ISBN 978-0-387-95273-4, https://link.springer.com/book/10.1007/b97644) appears to do what we want, straight out of the box.
Interaction with multiplexer is slightly interesting. The multiplexer really only cares about one thing: being able to match responses from the HSM to queries sent into the HSM, so that the multiplexer can send the responses back to the right client. At the moment, it does this by seizing control of the client_handle field in the RPC frame, which it can get away with doing because there's no end-to-end integrity check at all (yuck). We could add an outer layer of headers for the multiplexer, but would rather not.
The obvious "real" identity for clients to use would be the public keys (ECDSA in the above discussion) they use to authenticate to the HSM, or a hash (perhaps truncated) thereof. That's good as far as it goes, and may suffice if we can assume that clients always have unique keys, but if client keys are something over which the client has any control (which includes selecting where they're stored, which we may not be able to avoid), we have to consider the possibility of multiple clients using the same key (yuck). So a candidate replacement for the client_handle for multiplexer purposes would be some combination of a public key hash and a process ID, both things the client could provide without the multiplexer needing to do anything.
The one argument in favor of leaving control of this to the multiplexer (rather than the endpoints) is that it would (sort of) protect against one client trying to masquerade as another -- but that's really just another reason why clients should have their own keys to the extent possible.
As a precaution, perhaps the multiplexer should check for duplicate identifiers, then do, um, something? if it finds duplicates. This kind of violates Steinbach's Guideline for Systems Programming ("Never test for an error condition you don't know how to handle"). Obvious answer is to break all connections old and new using the duplicate identity, minor questions about how to reset from that, whether worth doing at all, etc. Maybe clients just shouldn't do that.
Does the resulting design pass examination by clueful people?
Does this end up still being significantly simpler than TLS?
The Cryptography Engineering protocols include a hack to work around a length extension weakness in SHA-2 (see section 5.4.2). Do we need this? Would we be better off using SHA-3 instead? The book claims that SHA-3 was expected to fix this, but that was before NIST pissed away their reputation by getting too cosy with the NSA again. Over my head, ask somebody with more clue.