Reliable Delivery

Let P1 be a process that sends a message to process P2. If neither P1 or P2 crashes, and NOT all messages are lost, P2 eventually delivers message. Reliable Delivery is a liveness property! We assume the Omission Fault model here.

CSE138 (Distributed Systems) L8: forms of fault tolerance, reliable delivery, reliable broadcast

Most Distributed Systems work under the assumption that there will be Omission Faults [ We will talk about Byzantine faults later ]. Now under the Omission Faults model, the fundamental solution is to re-transmit messages. Therefore we need to consider the side-effects of these re-transmissions.

The 2-generals problem is a thought experiment that explains the fundamental problem with communication between 2 nodes. Theoretically it states that two communicating nodes can never really guarantee 100% consensus, they can ONLY be sufficiently certain that they have both reached it. Omission Faults is a 2-generals problem!

Idempotency

Why is Idempotency important for Reliable Delivery? This is because, fundamentally to achieve reliable delivery, we need to deliver the message more than once. However, we want to make sure that the state of the receiver is does not change if the same message is delivered more than once. It will potentially change for the first message it receives, but it must not change with subsequent same messages. Therefore, Idempotency is an important consideration for Distributed Systems.

“exactly-once delivery”

There are some systems that claim to provide “exactly-once delivery” [ Kafka ]. If such a systems functions well, we do not need Idempotency on the receiving end. However in practice, it is hard to achieve this. In most cases, algorithms on the receiving end will try to de-duplicate messages and try to deliver it only once. This is also Reliably Delivery but this approach is not used in practice.

TCP [ Does FIFO delivery imply Reliable delivery? ]

TCP provides FIFO delivery + Reliable delivery. However, we can have Reliable delivery without it being FIFO/Causal/Total-Order etc. But does the FIFO/Causal/TO orderings imply that the underlying network is Reliable? Not-really as explained below!

So basically FIFO/Causal/TO etc. are ordering mechanisms. Now considering that the network itself does not guarantee any order, it is up-to the Sender and Receiver to work on an ordering protocol. But back to the question, do Ordering Mechanisms imply Reliable Delivery? From the discussion above, they do NOT. The Ordering Mechanisms simply mean what they say, FIFO will make sure that packets are delivered in FIFO order. So if an intermediate message is missing, it will buffer the rest of messages etc. [ can be implementation specific ]. Without Reliable Delivery, these mechanisms might not work very well, but still they are stand-alone concepts.

However, IMO for practical implementations, Ordering Mechanisms will rely on some kind of Reliable Delivery.

How can Reliable Delivery be implemented?

There can be multiple solutions for example [ just a few ]:

  • Sequence Numbers [ Receiver sends RESEND for each Sequence number not received ]
  • ACK for every packet received