Development of a Cryptech ASIC Implementation

Introduction

The aim of the Cryptech project is to develop an open, free, and auditable HSM. The Cryptech HSM includes both SW and HW parts. In at least the first iteration of the Cryptech HSM, the HW parts are implemented using FPGA devices. However, the ability to implement the HW parts in a Cryptech ASIC device in a future iteration is anticipated in the design. This text provides a short description of what the HW part of the Cryptech HSM contains, the design style used, and what would have to change in order to implement the HW part in an ASIC.

General digital functions and internal memories

The Cryptech digital functionality cores, such as the SHA-256 core, are written in generic RTL (Register Transfer Level) Verilog code. The code is written in a fairly conservative coding style and use language features from IEEE 1364-2001 (aka Verilog 2001).

All RTL code is divided into modules that contain one process for register updates and reset (reg_update), one or more combinational processes for datapath and support logic such as counters. Finally if needed, each module has a separate process that implements the logic for the final state machine that controls the behaviour of the module.

All cores are divided into a core, for example sha256_core.v and a number of submodules the core instantiates. The core provides raw, wide ports (256 bit wide key for AES for example) that is not suitable to use in a stand alone system. Instead each core comes with a top level wrapper, for example sha256.v. This top level wrapper contains all registers and logic needed to provide all functionality of the core via a simple 32-bit memory like interface. If the core is going to be used as a tightly integrated submodule, the wrapper can be discarded. Similarly, if the core is going to be used in a bus system that use a specific bus standard such as AMBA AHB, CoreConnect or WISHBONE, only the top level wrapper will be needed to be replaced or modified to match the desired bus standard.

The RTL code does not explicitly instantiate any hard macros such as memories, multipliers, etc. Instead all such functions are left to the synthesis tool to infer based on the code. All memories are placed in separate modules to allow easy modification of the design. In an ASIC setting any memories not automatically mapped will be replaced by instantiation of specific macros.

Some of the memories in the designs have combinational read (i.e the read data is not locked by an output register, which infers a one cycle read latency). For some FPGA technologies these memories are not compatible with the available physical memories. The synthesis tools therefor implement these memories using separate registers rather than selecting a memory instance. In an ASIC implementation these memories would likely become real memory macros to allow for a faster and more compact implementation.

Interfaces

External interfaces such as GPIO, Ethernet GMII, UART, etc., will always require some modification for the Cryptech design to be implemented in a given technology, whether it is a specific FPGA type or an ASIC. The important thing is that the Cryptech design does not use technology specific macros to implement the interfaces. But pin assignments, timing, and electrical requirements will always require adjustment and work.

Clocking and reset

The design style used in the Cryptech Verilog code currently follows the guidelines from the FPGA vendors Altera and Xilinx. This means that we use synchronous reset. For an ASIC implementation this will also work, even though asynchronous reset is far more common in ASIC designs. Changing to asynchronous reset is not a very big undertaking however, as the register reset and update clocking are separated into easily identifiable processes (reg_update) in the modules.

Most if not all registers in the Cryptech Verilog code have a defined reset state. Most registers also have a write enable signal that controls the update. This corresponds well with the registers available in FPGA technologies from Altera and Xilinx and their recommended design strategy from the vendors. This is also in line with common and good design styles for ASICs, which allows for compact code and low power implementations. The design is currently not use any clock gating. In future revisions this might be added if power consumption needs to be reduced and does not add side channel issues.

External memories

The Cryptech hardware design will use external persistent memories for protected key storage as well as external SRAM for protected master key storage. In an ASIC implementation the master key memory would probably be integrated to further enhance security.

Just like other external interfaces (see above), the interfaces for the external memories do not use any explicitly instantiated hard macros in the FPGAs.

Entropy sources

The current Cryptech design contains two separate physical entropy sources.

1: An avalanche noise based entropy source placed outside the FPGA. The entropy source signal is sampled by the FPGA using a flank detection mechanism.

An ASIC implementation would be able to use the external entropy source just like the FPGA. Furthermore, depending on the process options, it might be possible to have an internal avalanche diode based on ESD structures commonly used in I/O pin implementations. In a power management capable process, functionality available in step-up converters might also be possible to use as internal avalanche noise source.

Note that integrating the avalanche noise source does not mean that an off-chip noise source is excluded. The Cryptech RNG is modular and having both an internal and an external avalanche noise source is quite possible.

2: A ring oscillator based entropy source placed inside the FPGA. The ring oscillator used in the FPGA is based on carry chain feedback through adders. An ASIC implementation of this ring oscillator should work and produce noise with similar characteristics. However the specific circuit will have to be characterized with explicit layout and qualified for the given process.

Toolchain

Crypech currently use Verilog simulators for functional verification and commercial FPGA tools for implementation including time analysis.

An ASIC implementation will require several new tools including tools for synthesis, place & route and static time analysis that is acceptable as sign-off tool by the chip process vendor.

Conclusions

The HW designed for the first iteration of Cryptech is not specifically designed for FPGA implementation, but is in fact designed in a generic way to allow for easy implementation using different technologies such as ASICs.

There are however parts of the design that will have to be updated or modified in order to create a good ASIC implementation. The Cryptech project is confident that we know what those parts are and what they would entail.

Developing an ASIC will however require new tools which will incur costs.