PoC Deployment recommendations?

carter · August 23, 2019, 1:36am

I am working on a PoC for a client where performance is important, but availability not so much (at this time). We are trying to stay within the free license limit (old 128G RAM limit) for the PoC. The app is read intensive, the majority of data is in a handful of columnstore tables (each 50G to 100G in size), and the queries are fairly complex (some taking 30 - 45 seconds). We have tested with a single node cluster (1 MA with 16G RAM and 1 leaf with 108G RAM/32 cores) and get fairly good performance, but I’m wondering if we could get better performance (with the same amount of RAM and CPU/cores) by spreading across multiple virtual machines? I assume that performance will improve some b/c each virtual machine has its own SSD, so better we should see much better disk IO rates collectively… I’m just wondering how much better, and will it be worth the extra effort to manage 3 or 4 virtual machines instead of just one.

rnarayanan · August 24, 2019, 5:13am

Hi Carter,
Personally I have not tested in an on-prem hosted VM environment. But MemSQL runs in AWS and I guess you could theoretically think of it as a VM environment.
Since the data is all in ColumnStore tables, spreading it across more leaves might be beneficial taking advantage of partitioning features of MemSQL etc. The RAM of 108 GB can then be given to the 4 virtual machines which are the leaves and it will be used for query processing. You have to ensure that the OS configuration for the VM ensures that there is no over-provisioning for memory and also there are enough CPU cores dedicated per VM.
Do let us know your experience with the performance change.
Good luck,
Ramesh

deloy · August 26, 2019, 12:04pm

To help maximize the performance of your columnstore database in the free tier you might want to align the number of cores available in free tier (32 cores) across 4 leaf nodes (4 leaf nodes with 8 cores in each node) and then create the database with 32 partitions. This will create 32 parallel workers executing your queries.