How long does the lock occur when the 'rebalance partitions' running?

kyoungho.kum · May 20, 2020, 6:27am

Hi,

https://docs.memsql.com/v7.0/reference/sql-reference/cluster-management-commands/rebalance-partitions/#remarks

I’d like to know more about the temporary blocking (=lock) of the following contents.

REBALANCE is an online operation, meaning that as it runs you can continue to read and write data in the database you are rebalancing. However, since REBALANCE often needs to move the location of a master partition to another leaf, write transactions may experience some temporary blocking during the rebalance as this happens.

Based on the output of ‘EXPLAIN PARTITIONS …’, I predicted the lock point as follows.

Synchronize transaction logs after Snapshot copies during the ‘COPY PARTITION’ phase
PROMOTE PARTITION ...

Thank you in advance.

hstoyanov · May 20, 2020, 8:01pm

Hi @kyoungho.kum,

You can see what REBALANCE will execute if you run EXPLAIN REBALANCE PARTITIONS ON db. Further SHOW REBALANCE STATUS will show you some details on the progress being made.

As an example, consider what happens when rebalance attempts to execute a COPY followed by PROMOTE, e.g.:

COPY PARTITION db:0 TO 'memsql-leaf01':3306
PROMOTE PARTITION db:0 ON 'memsql-leaf01':3306

The COPY command will create an async slave on the leaf. The master instance will send a snapshot + logs to the async slave and continue sending all new logs to the newly created slave. The master will commit writes and send logs to the slave without waiting for slaves to acknowledge anything.

The COPY comnand will then turn the async slave into sync slave. During that change, queries to the master partition may see some minimal increase in latency. After that change, all writes to the master partition will succeed only after being acknowledged as received by the sync slave.

Then the PROMOTE will lock access to the partition on all aggregators which will block queries to the partition. Then the master and the existing slave are locked. These two operations make sure there are no running transactions on the partitions while the change happens.

Then the sync slave is promoted to master while the old master is demoted to a slave. After that completes, the PROMOTE operation will unlock the partition and point all aggregators to the new location.

The duration of locking depends on 3 things:

the size of your cluster, in particular, the number of leaves and aggregators;
the kind of workload.
hardware.

A larger cluster will see longer blocking (as there are more aggregators to be coordinated). More data being written or long-running transactions can also make it take longer. Hard drives vs SSDs make a difference as the writes on which instance is the master needs to be written to disk. Network latency also impacts how long each of the many steps takes.

I wrote a short test to measure this on my laptop and I see it blocking a write query for 0.02s during the async to sync change. The PROMOTE parts blocks the write query for 1-2 seconds.

I would recommend measuring it on your cluster. The docs describe how to issue COPY and PROMOTE operations manually.

Is there a target minimal delay you want to hit?

kyoungho.kum · May 21, 2020, 1:13am

Thank you very much for your detailed answer.

We understood it well.

We believe that the customer’s curiosity and anxiety about how long a temporary blocking will occur during the scale-out/scale-in process will be resolved.

Thank you again.