How does MemSQL detect the state of host for HA

andrea.woo · June 10, 2019, 10:13am

Hi team,

When I shut down the host machine, the slave partitions are not promoted in some times.
I guess the master aggregator could not sense the state of the host as an offline.
When I only killed the process in force on the CLI, HA worked fine.

Through this, I have some question.

What factor the master aggregator check whether the host works or not?
How to configure the options for HA?
Is there any strategy to decrease the HA downtime to promote a slave to master?
(In my case, the transaction were failed for 4 seconds)

Regards.

JoYo · June 11, 2019, 8:48pm

Hey Andrea,

The Master Aggregator will always heartbeat nodes to make sure they are up. However, we have a grace period after a failover where we don’t allow more failovers, configured by the globals failover_initial_grace_interval_seconds and failover_maximum_grace_interval_seconds, in order to prevent nodes from failing back and fourth. This is the most likely reason, during testing of failovers, that they don’t happen quickly. Consider tuning these parameters during your testing.

Best,
-JoYo