The documentation states that “For a transaction to commit, the majority of the group have to agree on the order of a given transaction in the global sequence of transactions.“
This means that as soon as the majority of nodes member of the group ack the writeset reception, certification can start. So, as a picture is worth a 1000 words, this is what it looks like if we take the illustrations from my previous post:
So theoretically, having 2 nodes in one DC and 1 node in another DC shouldn’t be affected by the latency between both sites if writes are happening on the DC with the majority of nodes. But this is not the case.
As you can see on the video above, every 3rd write to the system is affected by the latency. That’s because the system has to wait for the noop (single skip message) from the “distant” node. As Alfranio explained it in his blog post about our homegrown paxos based consensus, XCom is a multi-leader or more precisely a multi-proposer solution. In this protocol, every member has an associated unique number and a reserved slot in the stream of totally ordered messages. There is no leader election and each member is a leader of its own slots in the stream of messages. Members can propose messages for their slots without having to wait for other members. But if they don’t have anything to say, they need to tell it too and this is where we are affected by the latency.
So in conclusion, having a distant node (or with higher latency) as member of a Group slows down the full workload. Not constantly but at least in correlation with its ratio on the cluster. 1/3 for a 3 nodes cluster for example.
At least for now, if you need to have a distant node and using it only for reads, I would advice to use asynchronous replication between your two sites or being prepare to pay the cost of the latency.