Expanding cluster to rotate hardware

Hi, we have a cluster of 4 hosts : master aggregator, child aggregator, 2 x leaf (leaf has 8 vcpu 32gb ram). The cluster is running in AWS.
We would like to change the hardware running for the servers of the leaves. For that we need to expand the cluster with another leaf, rebalance partitions and then remove the old leaf and rebalance partitions again.
I saw there is a possibility of a data skew due to the amount of leaves being unequal, but as mentioned above we just need to expand the cluster to change the hardware and it will be back afterwards to 2 leaves.

Is it ok if we do it this way ? can a data skew result in data corruption or risk data in the cluster ?

Thx,
Elad

Data skew alone will not risk data corruption or otherwise risk data. It’s important to maintain redundancy 2 during the whole process if you want HA, so have at least 2 leaves at all times. If you don’t have enough disk to hold all the data, that could be a problem though, so make sure to check for ample disk space before you start. And of course, take a backup before starting.

I think, based on first principles, the final REBALANCE after you get your new hardware in and remove the old hardware should leave things balanced, but I don’t have first-hand experience with that.

Hi @elad,

Good to see your question again. If you don’t worry about interruptions to your database services for a short time, you may consider backup your data and restore from the backup after you re-configure your cluster. Do you think this strategy works for you?

Thank you @ywang and @hanson. I cannot take the database down so the adding leaf /rebalance/remove leaf /rebalance method works for me.
I have another problem though, in the new server hardware I need to mount the data dir of the leaf on a different path. Is it possible ? how do I do it ?

thx !
Elad

The first idea that came to my mind is to use symbolic links. Maybe you have already thought of that.

I assume you are using the MemSQL7.0. You may want to try this command. SingleStoreDB Cloud · SingleStore Documentation

One of the key is called “datadir”. You may try it.

you are right ! I actually managed to do memsql-admin create-node and added the --datadir parameter when I expanded the cluster. It works perfect.

I have another problem now, after adding the new leaf I dont see it replicating… I tried to REBALANCE PARTITIONS a few times but no go… I still dont see data evenly distributed among the leaves.

Is it maybe related to the fact those database are in async replication instead of sync ? or maybes its the fact that I passed from 2 leaves to 3 leaves and that is why the cluster won’t rebalance ? @ywang @hanson

In MemSQL 7.0 and before, HA is implemented in pairs. Primary partitions of a nodes leave have their replicated partitions in another leaf. Then the primary partitions of the 2nd leaf have their replicated partitions in the first leaf. If you have 3 leaves, then that explains why the 3rd leaf is not used for the rebalance logic. Try to add another leaf.

MemSQL 7.1 beta is available now. There is a new variable leaf_failover_fanout. You can set it to ‘load_balanced’. This way, replicated partitions for a leaf node will be spread out to multiple leaves. GA is planned later this month.

By the way, I highly suggest you set the replication to sync. If you want HA, there is no reason to use Async.

Hope this helps.

Yu-wang

Ok, then right now I have old-leaf-1 paired with old-leaf-2 and new-leaf-1 is not paired with anyone and this is why rebalance partitions is not working. Let’s say I add new-leaf-2 which would probably pair with new-leaf-1. Then I assume balance partition works.

Now, I want to deprecate the old serves (that was the initial goal). So I remove leaf old-leaf-1. Now again Im back to 3 leaves in the cluster, old-leaf-2 has no pair whereas new-leaf-1 is paired with new-leaf-2.
Will I be able to rebalance partitions ? Will I be able to continue and remove old-leaf-2 without the loss of data ?

thank you @ywang

Hi @elad,

You have an interesting scenario. I still think backup and restore is the best way for you.
To play out exactly what you want, I will delete old-leaf-1; rebalance on old-leaf_2 and new-leaf_1; then delete old-leaf_2 and add new_leaf_2; rebalance. This way should be safe. :smiley:

I suggest you use explain command. This command shows you what the system will do on rebalance.

Hope this helps. Please let me know if you are moving forward.

Yu-wang