What's the recommended way to migrate testing cluster to production?

mpskovvang · February 20, 2019, 7:15am

We’re currently running a cluster-in-a-box setup in our production environment.

It is probably about time to move to a actual production setup as our workloads becomes more critical.

Can we safely perform memsql-deploy setup-cluster on this single node instance after configuring the addtional hosts following the official installation guide? The current node will be configured as the --master-host.

Or would it be better to deploy the production cluster on completely separate nodes and restore data from a database backup?

jack · February 20, 2019, 3:26pm

setup-cluster creates a new cluster, it can’t be used to expand an existing one. You could either:

as you suggested, create a new cluster and restore a db backup
add nodes to your existing cluster with memsql-admin create-node / add-aggregator / add-leaf (SingleStoreDB Cloud · SingleStore Documentation but adding to the existing cluster)

mpskovvang · February 21, 2019, 6:36pm

@jack, thanks for pointing this out!

I went for the first solution and restored a backup. It worked perfectly.

With multiple aggregators I’ll like to put them behind an internal load balancer with health checks. Does MemSQL expose a health service to ping with TCP to ensure liveness?

Jacky · February 21, 2019, 6:45pm

Hey Martin

Glad your question was resolved! If that answer worked for you, feel free to mark it as resolved so other people can see the solution if they encounter it too

hanson · February 22, 2019, 11:52pm

Consider just doing select 1 against each of the aggs to test for liveness. For a more robust check, create a tiny sharded table with a row on every partition and compute an aggregate over it where you know the result in advance, like select count(*) from canary_table and make sure you get the right count. Run the query from each agg. Also SHOW LEAVES and SHOW AGGREGATORS give some health status for leaves and aggs, respectively.

hanson · February 23, 2019, 12:34am

Also, when you restore a backup on a new cluster, it has the same number of partitions as it originally had even though your new cluster may have more cores. If you want to take advantage of all the parallelism the new cluster offers, you will need to create a new database on the new cluster with more partitions. Then you could insert the data into it or move it over from the (restored) test database using INSERT…SELECT… statements.

mpskovvang · February 27, 2019, 11:55am

Thanks for pointing this out @hanson.

I had the idea that MemSQL handled all this partition/shard magic itself under the hood. I’m glad you changed my view on this.

Unfortunately, I saw your message after we migrated and restored the database. Would it be possible the add additional partitions after creating the database? I found CREATE PARTITION in the documentation. Its marked as a This is a dangerous operation., but I believe this is a better approach than putting the application in maintenance mode, backup database and restore - which is a growing time-expensive operation?

jack · February 28, 2019, 8:05pm

Unfortunately you can’t expand the number of partitions in an existing database.

On the other hand MemSQL can move partitions between leaf nodes for you, including as you add more leaves, with the REBALANCE PARTITIONS operation. Depending on how many partitions your database is vs your cluster size, the number of partitions may still be sufficient.

mpskovvang · March 1, 2019, 5:35am

Thanks for the clarification @jack.

We’ll start with the REBALANCE PARTITIONS first and schedule a planned maintenance when we need to expand.

Do you have a rule of thumb regarding partitions per leaf node, CPU etc.? Performance is currently great. We do only have a 15 GB disk + 5 GB memory usage.

jack · March 5, 2019, 6:41pm

It depends on the workload, but a typical rule of thumb is 1 partition for every 1-4 cores.

hanson · April 3, 2019, 10:02pm

Using a ratio of more than 1 core per partition (e.g. 16 cores, 4 partitions) is sometimes done to provide a crude form of resource governance, since you get 1 thread per partition on 6.7 during query processing. But now, we have a resource governor feature, so it can give you the control you need, so you don’t have to rely only on partitioning to manage parallelism and concurrency.