Running parallel queries in columnsore

jilani.m · August 9, 2019, 7:15am

Hi,
I’m running multiple queries in parallel in columnstore, which results in slow performance.

I’m running a query on a table with 7.5 crore rows, and the query is running in 800ms.
while I’m trying to run the query twice parallel, each query taking around 1600ms.

why this is happening? and what is the optimal number of such queries we can run in a span of 1 hour?

Thank you.

zmeidav · August 9, 2019, 11:56am

I believe it depends on how many leaves/partitions your query is running.
Your query is probably utilising all cores in your cluster, so I assume running both at the same time could result resource sharing.

Another option is that your storage is saturated. Are you using SSD/HDD for the column store?

From my experience, if you can design your table and queries to hit specific partitions by their shard keys, you should be able to run more in parallel, until one of the resources is saturated(disk, cpu, network).

jack · August 16, 2019, 10:33pm

Agree with zmeidav. When only one query is running, MemSQL will use all CPUs to execute it as quickly as possible - MemSQL can parallelize your individual query across all cores. (CPU is the most common resource that becomes fully utilized and therefore the bottleneck, but the same logic applies to e.g. disk i/o). When you run two queries at the same time, they have to share the resources. So the throughput of your queries will be constrained by how much CPU you have available. Each query takes 800 ms. so in an hour you would expect to be able to run 4500.

If this is too slow, try examining the query profiling information SingleStoreDB Cloud · SingleStore Documentation to see how it might be optimized.

mbhojani · August 19, 2019, 10:41pm

@jilani.m - Another thing you can try is to tweak the ratio of cores to partitions. Our default is 1:1 mapping of number of cores to number of partitions in a database. However, for high concurrency workloads, it may make sense to change the ratio of cores:partition to 2:1 or 4:1 so that you have multiple cores available per partition and execute multiple queries at the same time.

Note: this may have a negative performance impact on things like ingestion via pipelines. But if that’s not a big concern, then this method can work.

You can choose the number of partitions during database creation like this:

CREATE DATABASE test PARTITIONS 16;