Quick question: there any recommendation between the proportion of nodes aggregators and leaves?
I’m not a memsql employee, but as far as I know it really depends on your workload.
It depends on how much “work” the aggregator has to do.
For instance, I tested a heavy workload based on insertion and JSON parsing, I had 2 leaves and my master aggregator was topping 400% cpu.
Once i added another aggregator and split the same workload on two aggregators, I got better performance.
On the other hand, if the query is spiking all leaves, you need to add more leaves, and not aggregators.
Whenever I approach and examine a database or NoSQL such as Elasticsearch, you start with some minimum requirements of units that you need to store and handle a basic workload.
You then tune and add more cpu / ram as needed.
In elasticsearch’s case, it would be what kind of workload you can handle with 1 shard, and then start adding more shards to it.
In this case you have 2 dimensions to play with, leaves and aggregators, and it really matters on your workload.
Hope this helps!
As @zmeidav covered, the best way is to install a cluster and then monitor system performance to gauge what kind of cluster topology and system resources you will need to meet your use case and workload requirement.
This does mean actively connecting to each host in your system and monitoring CPU and IO utilization, etc., or installing a system monitoring solution of some sort.
We do offer some guidelines in our documentation that might give you more insight on how to improve your cluster performance and how to structure your cluster topology.
One point to consider, if you need disaster recovery and plan to run in high-availability mode, then you will always need an even number of paired leaves.