There is no point in waiting until its written to disc, it just adds extra latency and gives you no extra guarantees. If the backup crashes when the live is still up then it would start replicating from scratch again anyway. If it crashes during the window after the live has crashed but before the backup has finished synching then you have lost HA anyway, however you still have data integrity at the original live broker
I so first this doesnât mean that the replication needs to occur after sync to disk. They can continue in parallel.
You simply want the slave to fsync before replying success of replication. Also note this should be togglable with the current behaviour being default to not effect existing users, who simply need to have protection from a single failure in HA mode. eg you have to toggle this on if you want this.
Lastly Iâm not sure what this brings you still, as soon as slave activates on master failure it fsyncs itâs journal to disk as part of becoming active. you should note that if youâre actual requirement is you want to protect from data loss in case of two node failure you can simply add more slave nodes. So in case Master died, and then the first slave failed you would still have the second slave with the data (and now active) and on activation would fsync to disk.
Michael is correct, that's what I meant. Let me explain how this works.
Every record that is written to the journal is also sent to the backup.
Once the backup receives the packet it responds with an ack and in parallel writes it to disc.
When the live crashes the back up fsyncs and then starts.
There is a small window between when the live crashes and when the fsync occurs that the backup may crash, but this is unlikely to happen and you stil have data consistency as the journal is on the live.
The caveat here is if the disc on the live was destroyed, but again the window is so small there are more likely scenarios that we could improve from a crash pov.
@shoukunhuai Why don't you explain your use case, not in an implementation way but something like 'when x happens I want to avoid y'.
I think it's a good idea to understand what we failures we can recover from and which ones we can not in the context of consistency and availability. Note, I'm not including network issues here.
Availability: This PR doesn't really concern availability but for context: A single live/backup pair setup can recover from a *single* machine failure. In the event of a live failure, the backup takes over, (it may have some data in memory at this point). In the event of a backup failure, availability is preserved.
Consistency: Consistency is preserved even if both machines fail. The ack isn't sent back to the client unless it's persisted on the live. Should live fail, the backup is still consistent, (though there is a small window where data is in memory). Should the backup fail before it syncs to disk, it will not have consistency, however, the live still has the copy of it's journal which is up to date.
There is a known issue here in that the broker with the most recent journal must be started first after a live and backup failure (this goes beyond this case) and required manual intervention to deal with. We will be addressing this soon.
There is however still a use case for this patch. In that it does allow recovery from a disk failure should both live and backup fail. The probability is likely very small, but I can see a case for ensuring data is persisted in two places, but not at the expense of performance for other use cases.
This could be a configuration option. However, for this use case I would recommend using a shared store solution as dedicated shared storage solutions can already address these issues.