One Engine, Many Championships

Tl;dr: Zengo is known for maintaining one of the world’s largest open-source MPC cryptographic libraries. In recent months, our Zengo X Research Team has upgraded part of our MPC architecture (Gotham City cryptographic libraries), skillfully navigating Rust programming language intricacies to decouple core cryptographic functions from environmental variables such as databases and authorization policies.

Note: This is an advanced Zengo X blogpost. Join our MPC cryptography telegram group to continue the conversation here.

Introduction

At Zengo wallet we innovate in all directions. Since our product is delivered as software we also strive for the best software practices and continuous code improvement. Zengo cryptographic libraries are wrapped in gotham-city project, which is a standalone server and client app to demonstrate a two party ECDSA (Elliptic Curve Digital Signature Algorithm) signatures protocol. Up until a few months ago, that repository was used in the production environment as well. We had a monorepo which anyone could clone to create a signing two party protocol with one server and one client, and at the same time Zengo was using it in production to serve hundreds of thousands of users as the server cosigner. That created less elegant code since a monorepo served two different environments with different needs: A public one open to anyone that wanted to experiment, and a production one with concrete authorization policies and different DB management systems. To achieve this there were variables set up in each environment, such that the source could differentiate code logic at each case with ugly switch cases.

After an excellent collaboration with the Fireblocks team, who worked with us to help build robustness into our code implementation (with respect to aborts which were not happening in specific cases), we started working on a patch in an internal repo by cloning the public Gotham City repo. The reason for not working directly on the public repo was to ensure a patch was fixed and deployed before the public update shared by the Fireblocks team. Thus, created a slight differentiation in what is public and what runs in production, even if the core cryptographic protocol remained the same. Before diving into the details of the new architecture, it is worth analyzing our current software stack and how it works in public with Gotham City.

Monolithic Gotham-City Architecture

Zengo’s software stack architecture is explained in the below diagram. Starting from bottom up, we have our core cryptography crate: curve which implements all the arithmetic and the underlying curve operations in need for our production Lindell17 two-party ECDSA threshold signing protocol. That is used by gadgets exposed in two-party-ECDSA crate (uses the minimum necessary parts from the multi-party-ECDSA old crate). Gadgets include Paillier encryption, zero knowledge proofs in need for the threshold signing protocol, commitments and DH exchange protocols. At the top of gadgets sits a wrapper crate: kms, which exposes functions tailored for all rounds of the threshold signing protocol for key generation and signing.

Finally, the exposed round protocol function by kms are exposed by the http rocket server as endpoint routes. In order to serve many users the state in between the different rounds of the protocol is stored in DB which is different in production and in public as well the extra authorization and authentication of users which hit gotham server.

Monolithic old Gotham City server exposed in public and in prod

As time progressed, we patched the Fireblocks discovery with aborts in the private cloned repo. Afterwards we had to decide: Either we clean our codebase at the DB and authorization level and we keep two different repos – one for public and one for production. Or, we keep having a single monorepo with switch cases depending on which environment the server lives in. This dictates different db and authorization management. We chose to go with the first choice: Having two different repos which both derive the same kms, two-party-ECDAA, curve crates. That means any change at those three core crates would be reflected in both repos. However, we had to duplicate all the code for the route endpoints of the rocket http server. That means we had two repos implementing exactly the same protocol for keygen and sign (yellow boxes).

Maintaining two repos for Gotham City server: Public and Production

Soon, we realized the downside of this approach:

  • Inefficient procedures: Double code maintenance for the same protocol slows production efficiency and engineering pipelines.
  • Duplication: Possible protocol changes at the keygen or sign level need to be reimplemented in both repos.
  • Obscurity: At Zengo we believe open source should not only be exposed as an isolated repo link but also technically enforced. That means even if both repos -public and production – use the open source crates curve, two-party-ECDSA and kms, what actually runs in production is a differently configured gotham-city public server.
  • Low Interoperability: That dual repo maintenance creates low interoperability if someone decides to change the peripherals: DB, authorization or any other components that needs to be plugged on top of the core keygen and signing protocol.

New Architecture

Source: https://en.wikipedia.org/wiki/V6_engine#/media/File:IC_engine.JPG

Sitting back and observing the situation: Having two repos with the same protocol implementation is similar to the analogy of having cars being manufactured with the same engine and spending time for each different car to manufacture the same engine for different chassis car frames. Instead, a more productive approach would be to replicate the engine from an external manufacturer and spend time only on the specific car chassis and different parts. Not only does this increase productivity but also allows for a more clear separation of different modules to be maintained.

This is what we aim to do with Gotham City: Separate the Gotham engine into all of the necessary components for two-party ECDSA that can be exposed via an http server through another crate, which implements all the other peripheral components. It is like Redbull winning two consecutive F1 championships with a Honda engine.

You guessed right! We now have a common Gotham-engine crate which implements all the rounds for keygen and signing but abstracted in such a way that the DB API and authorization API are implemented by the users of gotham-engine: Gotham-public and Gotham-private.

The responsibility of the users in contrast with the previous version is to just bootstrap a rocket server, mount the endpoints for keygen and sign from gotham-engine and pass their implementation for DB management and authorization. In that way, there is a clean architecture with zero duplication, better code maintenance, minimum workload from new users of gotham-engine, high interoperability which can be easily extended at the gotham-engine level and transparency for open source visibility.

To wrap up the cons of the new architecture, these are summarized as it follows:

  • Zero code duplication
  • Efficient engineering maintenance/pipelines
  • Transparency
  • Interoperability

Rustacean Implementation

If you’ve followed us until now and you wear a Rust hat, now comes the interesting part. We had to solve quite a few challenges to be able to enforce the aforementioned architecture with Rust: single rocket server endpoint implementations for keygen and sign which are parameterized at run time with different DB API and authorization policies, coming from their implementers.

Our ultimate goal is to have single change at protocol level which will be reflected both in public and in private with different implementations for DB and authorization. Somehow the private and the public Gotham servers should link their keygen and sign endpoints to the engine functions implementing the logic which is parameterized with DB and authorization.

Wrapped Endpoints

We started approaching the issue with a DB trait for typical insert and got functions that will be implemented by the types implementing the trait. For keygen and sign since we do not want the implementers writing duplicate code for those protocols we decided to have two traits: KeyGen and Sign with default implementations and mount them directly from Gotham-private and Gotham-public as http route endpoints. However, Rocket does not allow for mount endpoints as a default function from a trait. We bypassed that with some wrapper empty functions which are finally calling the default trait implementation for each endpoint.

Dyn Traits

Once the first obstacle was addressed, the second emerged. The old Gotham-city for any type of DB management either for getters or setters was using generics constrained in Serialize, Deserialize trait bounds. That was giving clean code for any type (struct) that keyGen and sign protocols were inserting/getting from the DB. But in our case that was marking the KeyGen and Sign traits unsafe trait objects following rust rules. That was a no-go situation as there was no other way to mount Gotham-engine default functions from the implementers without trait objects and dynamic dispatching. As a trait to be treated as an object it needs to be safe, and one of the rules for safety forbids generics. That was a big headache for us since we had to remove from the old insert/get DB functions the generic types and the trait bounds for deserialization since that is generic as well and rust compiler complaints. We had to find a way to serialize/deserialize all types that communicate with the database without default serde. Before doing that from scratch we realized: Typetag is a very useful crate that does the job for us with some annotations.

First we implemented a wrapper trait Value for inserting/getting values from the DB, which is typetag annotated. Then we had to implement that trait for all possible structs that we insert/get in the DB. Now, we have a trait function for inserts/gets which is type agnostic but is accepting/outputting a dyn trait object for the Value trait. That allows the default implementations for Keygen and Sign to use it for any type that implements the Value trait. Getting values from db as dyn traits and downcast them to concrete types was also possible with downcast_ref.

Gotham-Public

The implementers of DB trait have very minimum effort to bootstrap a Gotham server. We take as an example the already published gotham-public repo. Αfter implementing the DB trait API, the type has to implement empty KeyGen and Sign traits since these are default implemented at the Gotham-engine trait. The mounted endpoints for rocket server are fetched directly from Gotham-engine and the last thing is to pass a dyn trait in rocket server endpoints for keygen and sign as part of the state. That is done with a Mutex around a typecasted reference to the trait object. Since that must be thread safe that is why the traits are having Send and Sync trait bounds.

Looking Forward

In the next series we are planning to provide benchmarking results of the new architecture compared to the old one, scoped in computational overhead, memory, and throughput of concurrent client requests. We are also continuously optimizing our codebase and we will see ways to minimize total heap allocations.

To continue the conversation, explore our Github, or join our MPC Cryptography Telegram group, start here.