Abstract
We study implementations of basic fault-tolerant primitives, such as
consensus and registers, in message-passing systems subject to process crashes
and a broad range of communication failures. Our results characterize the
necessary and sufficient conditions for implementing these primitives as a
function of the connectivity constraints and synchrony assumptions. Our main
contribution is a new algorithm for partially synchronous consensus that is
resilient to process crashes and channel failures and is optimal in its
connectivity requirements. In contrast to prior work, our algorithm assumes the
most general model of message loss where faulty channels are flaky, i.e., can
lose messages without any guarantee of fairness. This failure model is
particularly challenging for consensus algorithms, as it rules out standard
solutions based on leader oracles and failure detectors. To circumvent this
limitation, we construct our solution using a new variant of the recently
proposed view synchronizer abstraction, which we adapt to the crash-prone
setting with flaky channels.