The Simplex Consensus Protocol #
In this post, we present a new consensus protocol called “Simplex”, that is faster than the state-of-the-art, and (in our eyes) even easier to understand. It is suitable for both teaching (in a distributed algorithms or blockchain class) and for potential implementation in practice. We hope that Simplex makes byzantine consensus more accessible to a wider audience. In particular, it has the easiest liveness proof that we know of.
For the full paper, see eprint.iacr.org/2023/463 or the proceedings version in TCC 2023 (link). For slides, which also present the liveness proof, see (link). For questions, email byc at cs.cornell.edu.
What is Simplex Consensus? #
Simplex is a consensus protocol that is faster than the state-of-the-art (on paper), yet easier to understand (in our eyes). It solves Byzantine Fault Tolerant (BFT) Consensus, also known as the blockchain problem: there are
As an example, we can imagine a blockchain system with
Paxos, PBFT, and Blockchains
Simplex is designed for the same network and fault-tolerance setting as for the seminal PBFT protocol, as well as for protocols such as Algorand Agreement, Hotstuff, Tendermint, and Streamlet. The famous Paxos protocol, as well as Raft, is for a similar setting, but where nodes can only crash and do not deviate from the protocol (i.e. “crash faults” vs “byzantine faults”). Paxos in particular has found widespread deployment in Internet infrastructure ranging from Apache Zookeeper to Google’s Chubby Lock Service.
In the last decade, byzantine consensus has found its way into the heart of most “Proof-of-Stake” blockchains, where byzantine fault tolerance (BFT) is necessary because validators may be malicious in nature. Protocols such as Ethereum and Algorand use a byzantine consensus protocol under the hood to reach agreement on transactions. Unlike Bitcoin, the consensus protocols used tolerate extended periods of network failure without sacrificing security, do not require Proof-of-Work, and offer much faster finality. Unlike PBFT, the underlying consensus protocols typically require each block to be proposed by a different player, for fairness. Despite these widespread applications, the underlying consensus protocols remain complex. We are primarily interested in designing a good consensus protocol for this blockchain setting.
Not only is Simplex potentially easier to understand than the state-of-the-art, it also provides faster transaction confirmation time than essentially all competing protocols designed for the blockchain setting1, under a theoretical analysis2. This makes the protocol an ideal candidate for both teaching and for potential deployment.
The Fastest Protocol Yet
For the sake of demonstration, imagine a situation where
of block proposers are missing or slow, and where every message takes exactly 1 second to be delivered. Suppose that we are considering a blockchain setting, and that a transaction is submitted right before the next opportunity to mine or propose a new block. How long (in expectation) does it take the transaction to be confirmed or finalized?
Algorithm Expected Confirmation time Simplex (this post) 5 sec Algorand Agreement 6 sec ICC 7 sec PaLa 15 sec Pipelined Fast-Hotstuff/
Jolteon20 sec Chained Tendermint/
Casper FFG23 sec Hotstuff 31 sec Streamlet 36 sec Note that we can think of the base Algorand agreement protocol and Fast-Hotstuff as versions of PBFT adapted to the blockchain setting. (For Jolteon, Fast-Hotstuff, and Hotstuff, which don’t specify concrete leader selection mechanisms, we assume that the leader is chosen randomly. For Chained Tendermint and Casper FFG, we consider the version specified in this post.)
Simplex also has great performance in optimistic conditions without faults, when analyzed in our model. See the paper for a more thorough analysis for both the pessimistic and optimistic case, as well as for a deep-dive comparison of popular protocols.
Note that our protocol has no relation to the famous Simplex Method for linear programming.
Problem Statement #
The Permissioned Blockchain problem. (Also known as BFT Consensus, or State Machine Replication, in classical distributed systems. It is called “permissioned” because the set of players is known ahead of time.3) We want a protocol that works as follows:
- Suppose that there are
players, of whom are malicious. - Over time, players receive transactions
from the environment, and want to order them into a single log (an ordered sequence) of transactions. Each player outputs an updated version of their log periodically. If a player outputs a transaction in its log, we say that the transaction is finalized, or confirmed, by that player. - Consistency (i.e. no forks). Suppose two honest players output
and respectively. Then either , or is a prefix of , or is a prefix of . This is the key property that ensures that players reach consensus on a single log of transactions. - Confirmation Time. Suppose a transaction
arrives at some time . How long after time does it take for the transaction to be finalized by every honest player?
For scalability, players typically batch transactions together into “blocks”. The log is then comprised of blocks, instead of individual transactions, hence the name “blockchain”. It is then easy to extract an ordered log of transactions from a blockchain.
The Partially Synchronous network setting. In a real-world network, such as the internet, messages may sometimes be delayed, reordered, or dropped entirely. Entire partitions of the network may lose connectivity due to an implementation bug, or an internet outage. It is essential to prevent a consistency violation (e.g. a double spend) in such a situation. We assume a network that behaves as follows:4
- Good network conditions. Under good network conditions, every message is delivered within
seconds (e.g. we can pick seconds), where is known ahead of time. - Bad network conditions. Occasionally, the network may be faulty. In these bad network conditions, there is no guarantee on message delivery.
We require the protocol to confirm transactions under good network conditions, and to preserve consistency even under bad network conditions.5
Why is this problem hard?
The challenge of faulty players. In our setting, faulty players can deviate arbitrarily from the protocol (i.e. they are “byzantine faults”). Such a fault model has been studied since the 1980s, and is essential for capturing real-world settings such as cryptocurrencies where some fraction of players cannot be trusted. (After all, it seems downright natural that some participants may try to subvert the protocol for their own benefit.) In particular, a faulty player may decide to stay offline, and can also send different (but conflicting) messages to different people, lying about their state. When combined with a unreliable network, this faulty behavior is difficult to detect and challenging to deal with. (Even crash faults are non-trivial to deal with, despite being more benign; see the famous Paxos protocol.)
A toy example. For example, suppose that there are
players, choosing between two block proposals and . Suppose that each honest player “votes” for a single block proposal of its choice. Naïvely, one may think that if a player sees 50 votes for , then no one else will ever see 50 votes for . However, this breaks down even if there is only one faulty player. 49 honest players could vote for , another 49 honest players could vote for , and the faulty player could vote for both and . If the network is unreliable, some player may see 50 votes for and no votes at all for , and proceed even though some other player saw 50 votes for . As a consequence, we generally require
votes on block proposals to make progress (instead of a simple majority), and additionally assume that . This is is the same approach taken by most prior work (i.e. see PBFT), and is the best we can do on an unreliable network. And like most solutions to the problem in the literature, we will also need to utilize public-key cryptography, to prevent adversaries from forging honest messages.
Protocol Description #
We start with some preliminaries and data structures that will set the stage for the main body of the protocol.
Digital Signatures #
Our protocol uses digital signatures, which requires a basic setup ahead of time.
- Public key infrastructure. Every player has a public key, and a secret key (which it keeps secret), generated for the digital signature scheme. Each player’s public key is known to every other player at the beginning of the protocol execution.
For any string or message
Data Structures #
Simplex uses a number of different data structures, many of which should be familiar to a blockchain enthusiast.
-
Blocks and block heights. The protocol operates on “blocks” of transactions. A block is a tuple
, where is the height of the block, is a cryptographic hash of the parent blockchain, and is a set of transactions included in the block. Thanks to the parent hash, each block essentially “extends” and commits to a parent blockchain, which represents a history of transactions. -
Genesis block. The genesis block is the default starting block
of height 0. Here, is the empty string ""
, representing some empty default value. The genesis block is always the first block in the blockchain. -
Dummy blocks. In Simplex, sometimes we may mine a “dummy block” during periods of time when the network is faulty, or when there is too much malicious activity. A dummy block of height
is simply the tuple . Note that the dummy block does not extend a specific parent chain. -
Blockchains. A blockchain of height (or length)
is a sequence of blocks where is the genesis block, and for each of other bocks, either is the dummy block of height , or for some choice of transactions. Here, is a publicly known collision-resistant hash function. -
Notarized blocks and blockchains. A “notarization” for a block
(which may be the dummy block) is a set of signed messages of the form from unique players . A “notarized block” is a block augmented by a notarization for that block. A “notarized blockchain” is a blockchain, where every block has a notarization.
High Level Structure #
The protocol runs in iterations
- Random leaders. For each iteration
, its leader is chosen by hashing the iteration number using some public hash function. In other words, , where is a predetermined hash function.6
As a preview, in each iteration
The Full Protocol Description #
We are now ready to describe the protocol in its entirety.
Starting iteration and protocol state. At the beginning of the execution, every player immediately enters iteration
Execution. Each player
-
Backup step in case of faulty network or faulty leader. Player
starts a new timer set to fire locally after seconds if player still hasn’t finished iteration by then. If the timer fires, and player is still in iteration , player sends to everyone a vote for the dummy block Here, is just the dummy block. -
Leader proposal. If player
is the leader, namely if , then player sends to everyone a block proposal of the form where is player ’s choice of a new block extending some blockchain that is notarized in player ’s view. Note that the parent blockchain has length . Player should include in every transaction that it has seen that is not already in the parent blockchain. -
Voting step. Player
waits to see the leader’s block proposal. On seeing the first (and only the first) such proposal of the form , player first checks that the block is well-formed (i.e. that for some set of transactions) and that is indeed a notarized blockchain of length in its view. If these checks succeed, then player sends to everyone a vote (Recall that on seeing votes for any given block , a player considers that block to be notarized.) -
Entering the next iteration. At any point during iteration
, if player sees a notarized blockchain of length , it moves on to iteration . At this point, if its timer has not fired yet: player cancels the timer, and sends everyone a message Simultaneously, regardless of whether the player sent a finalize message, multicasts its current view of the notarized blockchain to everyone else. (This ensures that other players will also enter iteration within one message delay).
Finally, at any point during the protocol execution, no matter the current iteration number:
- Finalizing transactions. At any point, if player
sees messages for some iteration number , each signed by a different player , player deems that iteration to be finalized. If an iteration is finalized, whenever player sees a notarized blockchain of length , they can finalize all the transactions contained within (and output them to the client).
Correctness of the Protocol #
In this section, we argue that the Simplex protocol has all the properties that we desire. Importantly, the protocol is both consistent (that is, two players never finalize two conflicting transactions), and is also fast during periods of network stability. In particular, whenever a new transaction is delivered to some honest player, the protocol finalizes that transaction quickly. Here, our treatment is meant to be informal and intuitive; for a formal analysis, see the full paper.
Consistency Argument #
Consistency requires that if one player outputs a sequence of transactions
For us, it suffices to show that if one player finalizes the contents of a blockchain
-
Claim. If some honest player sees that a block proposal
is notarized, no honest player will ever see a competing block proposal of the same height notarized, unless one of or is the dummy block of height . - Proof. In iteration
, each honest player will only vote for one of and . A faulty player can vote for both and . There are at least honest players and faulty players. Thus there are votes combined for either and . This just isn’t enough votes: if honest players saw that both and are notarized, then there must be votes. (This is also known as a quorum intersection argument.)
- Proof. In iteration
-
Claim. If some honest player sees that iteration
is finalized, then no honest player will ever see that the dummy block of height is notarized. - Proof. In iteration
, each honest player will send either a vote for the dummy block, or a finalize message for , but never both. A faulty player can do both. Thus there are messages combined of either type. By the exact same argument as for the previous claim, there just aren’t enough votes to make it so that someone sees both finalize messages and votes for the dummy block of height .
- Proof. In iteration
Recall that if a player finalizes a chain
So whenever a player finalizes
Liveness Argument #
Simplex has a delightfully simple Liveness proof, in contrast with every single other protocol that we know of in the literature (in this setting).
We show that the protocol makes progress during good network conditions. Assume that every message takes
First, observe:
- Observation. If some honest player has entered iteration
by time , then all honest players have entered iteration by time . (Make sure you see why.)
The liveness argument breaks down into two cases:
-
Claim (Honest Leaders). If the leader
of iteration is honest, and they entered that iteration by time , then all honest players will have finished iteration and moved on to (or higher) by time . Additionally, they all see a finalized block proposed by by . -
Claim (Faulty Leaders). If every honest player entered an iteration
by time , every honest player enters iteration by time . This holds even if is faulty!
Recall that
Now let’s show the claims. In the following figure, we break down exactly what happens during an iteration
And likewise, for an iteration
The full proofs of the claims are in the paper.
Theoretical Performance Guarantees #
The last key property that we want to highlight is Simplex’s fast transaction confirmation time. In both the optimistic case when there are no faulty players, and the pessimistic (and more realistic) case where there are faulty players, Simplex has better transaction confirmation time than essentially all competing protocols that rotate their leaders (when analyzed in our model).
Simplex requires only a single honest leader to finalize blocks. Moreover, the design of the protocol also allows us to set the timer to
Scalability #
Simplex can be made scalable using the same techniques as in Algorand, which is designed for a Proof-of-Stake setting with hundreds of thousands of validators.
Conclusion and Further Steps #
In this post, we presented the Simplex consensus protocol, which has potential to be the protocol of choice for both pedagogy and for deployment in practice (especially for blockchains). It offers faster transaction confirmation time (under a theoretical analysis), and also offers an easy-to-understand security proof. Work on Simplex is ongoing (e.g. see DispersedSimplex, which adds additional optimizations).
We hope you integrate Simplex into your own projects or your classes. For questions or to help out, email byc at cs.cornell.edu.
Footnotes #
-
In certain cases, classic protocols like PBFT may still offer slightly better latency than protocols designed for blockchains. This is because, when a transaction is sent to the system, it doesn’t need to wait for the next block proposal, unlike in blockchain systems. However, PBFT achieves this by having the same player propose every block (as long as it does not detectably misbehave). In other words, PBFT adopts a “stable-leader policy.” However, this is unsuitable for a blockchain application, where for fairness we require that a different player propose each block (see problems with miner extractable value in the Ethereum blockchain, for instance). Of course, we could also modify Simplex to use “stable leaders”, to improve on the latency of PBFT at the expense of fairness. ↩︎
-
An evaluation of the protocol in a real-world deployment remains to be done. ↩︎
-
When used for an application such as cryptocurrency, we typically require a blockchain protocol be “permissionless”, where players can join (and leave) the network at will. Proof-of-Stake protocols like Ethereum and Algorand accomplish this by treating each coin as a player and then running a permissioned blockchain protocol (such as Simplex) on those players. If at any point the total number of coins changes, the number of players
can be changed accordingly, with some care. ↩︎ -
For a more formal treatment of partial synchrony, see our paper, or this blog post, or the seminal paper of DLS88. Also note that, in practice, blocks may take longer to be delivered than, for instance, vote messages. ↩︎
-
This is a different assumption than for Bitcoin. Indeed, Bitcoin may lose consistency if honest nodes cannot contact each other for long enough (i.e. due to an Internet outage) and start mining on different forks. ↩︎
-
can be a different hash function than the one used to generate the parent hashes in blocks, which we denoted . In more theoretical terms, when we formally prove the protocol to be correct, we model as a random oracle, to ensure that leader election is truly random. As observed by PS17, this usage of a random oracle can itself be replaced by a pseudorandom function family and an additional common reference string. However, any hash function that is sufficiently pseudorandom will work in practice (whereas for , we required it to be collision resistant). ↩︎ -
This may require additional optimization to reduce storage costs in practice. ↩︎