Saturday 12 October 2019

Recovery in a FIX Protocol or in any message passing protocol.


While Sending Order to Exchange:

1. Persist OutSequence and the lastOrderId
2. Persist Order
3. Send the Order to the exchange

While Receiving Message from Exchange:

1. Persist Trade/OrderResponse
2. Persist the InSequence number


Explanation for the above steps:

While Sending Order to Exchange:

Lets assume the OutSeq of the Order is 500 and the lastOrderId is 500 as well.

crash-after-3: If the system crashes after the Step-3, this is the trivial case as the order has already been sent to the exchange and this state has been captured by us.
crash-after-2: If the system crashes after the Step-2, order is persisted in our databases and has not been sent yet. When we restart we recover the OutSequence and lastOrderId as 500.
In case of FIX protocol:
   - We connect to downstream using 501 as the sequence number.
   - The downstream will ask us to resend 500 as it has not received 500 yet so we end up sending message with 500 sequence number, and for all our next MsgOut we will have 501.

In case where the cancel on disconnect has been enabled and the downstream does not cares about the missing sequence number:
    - We connect to downstream and send the next message with 501 as the sequence number.
    - We never sent message with 500 sequence number to downstream.
    - We did create an order with 500 sequence number and that was persisted in DB, But as we have configured our systems for COD, we don't have to worry on this.

Crash-after-1:
If the system crashes after the Step-1, the new sequence is persisted in our databases and has not been sent yet. When we restart we recover the OutSequence and lastOrderId as 500.
- whereas we never sent 500 to the exchange but that's fine (Missing sequence).
- The next message will be 501 and this will be fine.


While Receiving Message from Exchange:

Crash-after-2: No issues at all as we have persisted everything.

Crash-after-1: We would have persisted Trade/OrderResponse with InSeqId=500, But our last Insequence number will still be 499 as our system crashed before persisting the InSeqId=500.
- When we restart we will request to replay messages from 500.
- The msg with SeqId of 500 will come as possdup=1 and since we have a orderresponse with 500 in our systems we will ignore this message.
- For the messages from 501 we will update our cache.