HybridDKG is one of the latest distributed key generation schemes and can operate in an asynchronous environment, making it the most viable DKG scheme to use over the internet.
Introduction
In a previous post, we discussed important VSS and DKG constructs and presented the Aggregatable DKG. Now that the reader better understands DKG, this article elaborates on a different DKG scheme called HybridDKG, as published in Distributed Key Generation in the Wild.
HybridDKG is a DKG that operates under the asynchronous communication model which is a much more realistic model than the synchronous model that most other DKG schemes rely on. Its underlying VSS, called HybridVSS, is based on AVSS, which is the component of HybridDKG that allows it to operate asynchronously. Since HybridDKG is operating under the asynchronous model, it requires both its VSS and DKG protocols to use consensus mechanisms in order for the nodes to agree on the secrets.
Assumptions
As with every cryptographic contruct, HybridDKG relies on certain assumptions in order to prove its security properties.
Communication model
HybridDKG uses the asynchronous model of communication, in which clocks of nodes may not be accurate (meaning they can be out of sync) and messages can be delayed for arbitrary period of time. In this model, it is assumed that the adversary manages the communication channels and can delay messages as it wishes. HybridDKG assumes that a real-world adversary cannot control communication channels for all the honest nodes and, thus, that the adversary (eventually) delivers all the messages between two uncrashed nodes.
A stricter assumption, named weak synchrony, is used only in order to prove the protocol's liveness. Under the weak synchrony assumption, the time difference delay(T) between the time a message was sent (T) and the time it is received doesn't grow indefinitely over time.
Weak synchrony is not necessary for the protocol's correctness.
It is also assumed that all communication is authenticated and the adversary is unable to spoof messages.
Faults
HybridDKG can tolerate non-byzantine node crashes, non-byzantine broken communication links and byzantine faults. It groups non-byzantine crashes and non-byzantine single broken communication links under the same fault type. The byzantine adversary is considered static, meaning that it cannot change its choice of nodes as time progresses.
The protocol can tolerate:
Up to t byzantine nodes
Up to f non-byzantine crashed nodes and non-byzantine failed connections at any given instant in time
d(κ) maximum number of crashes an adversary is allowed to perform during its lifetime
The number of the participants is:
n≥3t+2f+1
Adversary
An assumption introduced by the specific polynomial commitment scheme used in this HybridVSS is that the adversary is computationally bounded on the DLP with a security parameter κ.
Prerequisites
We first list some mathematical properties that will be useful later on:
To reconstruct a univariate polynomial of degree t we need t+1 points on that polynomial
Having a bivariate polynomial ϕ(x,y) of degree 2t, we can generate univariate polynomials by fixing the value of x (or y):
ax=x′(y)=ϕ(x′,y)
If the bivariate polynomial is symmetric, it holds that ϕ(x,y)=ϕ(y,x)
From the above we can derive that for a symmetric ϕ it holds ax′(y)=ay(x′)
HybridVSS
In the asynchronous model, each node follows two assumptions: that enough nodes will get their shares and there will be enough nodes participating in reconstruction. In order to remove the need to make assumptions, HybridVSS takes it a step further and provides a mechanism for nodes to keep track of how many other nodes are participating and their states instead of relying on the above assumptions.
We will now define the sharing, reconstruction, and recovery phases of HybridVSS. All the messages described below are tagged with a unique session ID (Pd,τ), with Pd being the session's dealer and τ a unique number.
Sharing
Initialization
The dealer Pd:
produces a random symmetric bivariate polynomial ϕ(x,y)=∑i,j=0tϕijxiyj with the secret encoded in as the constant factor ϕ(0,0)=ϕ00.
calculates a homomorphic commitment C to the polynomial which is a matrix with Cij=gϕij.
C can be used by nodes to verify that:
a given univariate polynomial ai(x)=l=0∑tai,lxl belongs to ϕ by checking:
gai,l=?j=0∏t(Cjl)ij,∀l∈[0,t]
a given value α corresponds to ϕ(m,i) by checking:
gα=?j,l=0∏t(Cjl)mjil
Dealing
Pd then deals to each node Pi a univariate polynomial ai(x)=ϕ(x,i) by sending a message called send.
This message also delivers C.
C is also sent with all other type of messages between nodes so that any two participants can verify the dealer’s commitment and prevent a malicious dealer from segregating the nodes by sharing different commitments.
Each Pi that receives a send, verifies ai(x) using C.
This polynomial provides a way for Pi to generate a point that belongs to the polynomial aj(x) of another node Pj since ai(j)=aj(i) due to ϕ's symmetry.
Gossip-based validation and consensus
Each Pi that receives a valid send message, sends a message called echo to each node Pj to inform them of the fact that the dealer got in touch.
echo carries ai(j) as a proof that it was dealt a valid ai(x).
After sending echo to all nodes, Pi enters a listening state, where it listens for two kinds of messages, echo and ready.
For every received echo from a node Pj the receiver Pi:
verifies aj(i) against C and if verified, it stores the point (j,aj(i)) in a set Aci(j,aj(i))j
if enough (⌈2n+t+1⌉) valid echo messages with the same commitment C are received to convince Pi that enough nodes were reached by the dealer, it uses the points in Aci to interpolate a polynomial aˉi(x)
sends a ready to all other nodes carrying a point aˉi(j) as a proof of the fact that it received enough echos
For every received and verified ready, Pi:
stores the received point aˉj(i) in the same set Aci
if enough (t+1) ready messages with the same commitment C are received to be able to interpolate a polynomial of degree t, Pi interpolates aˉi(x)
sends a ready that carries aˉi(j) to each node Pj
if enough ready messages (n−t−f) with the same commitment C are received for Pi to be convinced that enough nodes know that enough nodes are participating, it calculates its share si=aˉi(0) and stops.
Additional notes
A node terminates the sharing process if it has received (n−t−f)ready messages. Here, the node makes sure that there is a sufficient amount of honest nodes with shares. In other words, it makes sure that the participating honest nodes form a majority in the presence of up to t byzantine adversaries and up to ff crashed nodes.
If a node received a ready from another node, it implies that the other node has sent an echo and it was never received. I.e., a ready implies an echo. It is possible for a node to finish the sharing phase by only receiving enough ready messages. In this case, it needs to interpolate aˉi(x) in order to calculate its share, since the interpolation in echo processing didn't happen. This is the reason behind the interpolation step in processing a ready message.
An important note is that a node will only process the first send, echo and share messages it received from a specific node Pi during a session (Pd,τ). This is to make sure that:
nodes' echo and ready counters cannot be manipulated by an adversary by repeating messages
any recovery messages in circulation will not modify the systems state (this will become clear in the recovery subsection below)
Reconstruction
Let's first explain how the shares relate to the secret.
The share of Pi is si=aˉi(0)=aˉ0(i), which means that given t+1 shares we can interpolate aˉ0(x)=ϕ(0,x).
We can then evaluate this polynomial to 0 to get the secret s=aˉ0(0)=ϕ(0,0).
To reconstruct the secret, each node sends its share si to all other nodes using the message reconstructshare. It also listens for reconstructshare messages. For each received reconstructshare from a node Pj, it will first verify aj(0) against the commitment C:
gsi=?j=0∏t(Cj0)mj
and if the point is verified, it will store it.
When the node has collected t+1 points, it is ready to interpolate aˉ0(x) and then obtain the secret aˉ0(0).
Recovery
We denote d(κ) (κ is the system's security parameter) the maximum number of crashes an adversary is allowed to perform during its lifetime.
During recovery, the previously crashed node will ask the help of other nodes in replaying its previous communication. Each node keeps a record of all outgoing messages along with their intended recipients in a set B, and Bl indicates the subset of B intended for the node Pl. Additionally, each node maintains two counters cnt and cntl. The former tracks the number of overall (system-wide) help requests and the latter the number of help requests sent by each node Pl.
A crashed node that just recovered will first send a help message to all nodes. It will also replay messages in B. When a node receives the help message it will first make sure that the help limits are not exceeded: cnt≤(t+1)d(κ) and cntl≤d(κ) and if these checks pass, it will replay all messages in Bl, thus helping to replay the message history that is relevant to the recovered node.
With the above in mind we can see that, if recovery messages were to be processed, many, if not all, packet counters would count a recovered node twice.
HybridDKG
As with other DKG constructions, HybridDKG makes all nodes HybridVSS dealers. Each node generates a collection of shares from several HybridVSS instances and its final share will be the sum of those shares. Unlike DKGs operating in the synchronous setting, the asynchronous model introduces the requirement of agreement on at least (t+1) successful HybridVSS instances for a set of honest nodes. An additional complexity of the asynchronous setting is the the need to differentiate between crashed and slow nodes.
HybridDKG introduces a role called leader (L), which is responsible for making a proposal of a set of finished HybridVSS instances to be used. Since the leader might be malicious or crash, a leader-change mechanism is introduced. Nodes use timeouts to decide if a leader should be considered crashed.
HybridVSS is extended with a message, shared which is sent by every node that finishes the HybridVSS sharing phase. It carries all the ready messages for the session (Pd,τ) and signatures over them, which helps the leader to provide a validity proof of its proposal.
HybridDKG uses two phases, the optimistic and the pessimistic phase: the former is responsible for the key generation and is where crash recoveries take place; the latter is responsible for executing potential leader-change operations.
Optimistic phase
HybridVSS execution
The optimistic phase starts by each node Pd becoming a dealer of a HybridVSS session for a secret sd. At the end of a HybridVSS session, each node Pi broadcasts a (Pd,τ,shared,Cd,si,d,Rd) message. The set Rd includes n−t−f signed ready messages for session (Pd,τ), si,d is the share of node Pi and Cd is its commitment. All nodes collect Pd and Rd of received shared messages:
Q^Q^∪PdR^R^∪Rd
VSS proposal
Once L has processed t+1shared messages, it will broadcast a send message, proposing the set of VSS instances that these shared messages correspond to. This message is HybridDKG specific and not to be confused with the send message of HybridVSS.
send carries Q^ which constitutes L's proposal on the set of HybridVSS instances to use. It also carries R^ as a proof that the sessions referred by Q^ are valid.
Leader-change condition
Any node, apart from L, that processed t+1shared messages will start a local timer that stops after delay(T). If a node's timer stops before the node has reached the end of the protocol, the node assumes that the leader has either crashed or is malicious and requests a leader change. A leader-change request is performed by broadcasting a leadch message. We defer explaining the leader-change logic until we finish the optimistic phase.
Gossip-based validation and consensus
All nodes, apart from L, will start listening for send, echo and ready messages. Again, these messages are HybridDKG specific and not to be confused with their synonymous HybridVSS. Conceptually though, these messages serve the same purposes as in HybridVSS.
An echo message is send upon receipt of a valid send. The purpose of echo is to inform the receiver that the sender was reached by L with a valid proposal. echo carries the proposed set Q.
A ready message is send by nodes that have received enough (⌈2n+t+1⌉) echo messages for the same set Q or enough (t+1) ready messages for the same set Q. At this point, the node knows that the protocol can finish successfully, since it knows enough participating nodes and the ready message informs the receivers of that fact. The node will also update its Q^ set to Q, since at this point the set is agreed, and keep the relevant signed echo or ready messages as a proof. This update of the set serves as a note upon which VSS instances were agreed and is used during a leader change (more on that at the section explaining the pessimistic phase).
Shares construction
Finally, when a node receives enough (n−t−f) ready messages, it can finish the sharing protocol by calculating its share:
si=Pd∈Q∑si,d
and its commitment matrix, generated by the commitment matrices of each HybridVSS instance:
C=Cp,q=∏Pd∈Q(Cd)p,q
Pessimistic phase
A node enters this phase when it becomes unsatisfied with the current leader. The reasons for this dissatisfaction can be due to a time out or the receipt of an invalid message from the leader.
A predefined cyclic permutation of nodes is assumed to be agreed upfront. For simplicity, we will assume that leaders are sorted in a simple ascending order.
When entering this phase, the node will broadcast a message, called leadch, informing other nodes that it proposes a leader change. The messages carries the proposed leader Lˉ to which the nodes want to change. A dissatisfied node proposes the node that is next (in the ordering) of its current leader.
Two things need to be agreed between nodes: if the leader change is happening and, if so, which node will be the new leader.
A node that receives t+f+1leadch messages will be convinced that a leader change is happening and will broadcast a leadch proposing the smallest proposed leader it received. If the node receives t+f+1leadch messages proposing the same leader, it will accept that leader since consensus on the leader is reached.
Then, if the node is not the next leader, it will restart its counter and enter the optimistic mode.
In the case the node is the next leader, it will use the previously stored Q^ set as the VSS set to use. It will broadcast a send message with this set and the accompanying signed packets (shared, echo or ready) as a proof and then enter the optimistic mode.
Crash recovery
For recovery of a crashed node, the exact same approach with HybridVSS is used.
Reconstruction
Reconstruction of a HybridDKG secret is performed the same way as in HybridVSS. Each node broadcasts its share and collects shares from other nodes. When a node has collected enough shares, it will interpolate the corresponding polynomial and evaluate it at 0 to calculate the secret.
Performance
HybridVSS
Assuming no crashes and thus no recovery messages:
The size of HybridVSS messages is dominated by the size of the commitment C which has 2t(t+1) group elements and thus the protocol's bit complexity is O(κn4). An improvement exists that reduces the bit complexity down to O(κn3) by using a collision-resistant hash function (AVSS, Sec. 3.4).
In the following calculations, we assume that this improvement is used. The message complexity is O(n2).
Taking into account the recovery messages, the message complexity becomes O(tdn2) and the the bit complexity O(κtdn3) were d=d(κ).
HybridDKG
With up to n VSS instances being able to complete and with no pessimistic phases taking place we have O(tdn3) message complexity and O(κtdn4) bit complexity.
Counting in pessimistic phases, we have at most O(d) leader changing operations. Each leader change operation has O(tdn2) message and O(κtdn3) bit complexity. Thus the overall overhead of the pessimistic phase is O(td2n2) message and O(κtd2n3) bit complexity.
Adding the pessimistic overhead to HybridDKG complexities, we end up with O(tdn2(n+d)) message and O(κtdn3(n+d)) bit complexities.
Written by George Gkitsas, previously a zero-knowledge cryptography researcher & protocol developer at Heliax, the team building the Anoma Protocol.
If you're interested in zero-knowledge cryptography and cutting-edge cryptographic protocols engineering positions in Rust, check out the open positions at Heliax.