10-15 seconds of pre-compile

Hey all,

I have a mixture of tables, 5 are in memory row store, and 1 columnar.
The total size of the in memory are 234MB and the columnar is also a few tens of MBs.

When we run ad-hoc queries that join these tables + some group concat and group by, the first run takes about 10-15 seconds every time we add a new predicate, the 2nd run takes about 0.5-1sec
We are running with interpret_first and we use 6.8.8 build.

Any idea? this is highly unacceptable for interactive tables

Can you share the query here as well as the output of running the query with the compile prefix (compile select ...)

Hi @rob,

I am experiencing the same behaviour with MemSQL all the time.

My example:
SELECT str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’) AS RegisteredOn, count(*) AS count
FROM v_servicedesk
where ins_upd_ts > ‘2019-01-01’
GROUP BY str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’)
ORDER BY str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’) ASC

Whenever I add a filter by a new field to WHERE expression, it takes about 7-10 seconds for it to execute the first time, and 100-150ms for all subsequent executions.

Compile query output suggests that the total compilation time is below 1 second, so I am wondering what is taking the rest 6 seconds.

Create Mbc::Context 0.389 ms
Expanding Views 3.032 ms
unknown 5.037 ms
Optimizer: Query Rewrites 5.243 ms
Optimizer: Collecting Table Statistics: Other 0.376 ms
Optimizer: Collecting Table Statistics: Analyze Statistics 0.245 ms
Optimizer: Collecting Table Statistics: Sampling 25.627 ms
Optimizer: Setting Up Subselect Info 0.160 ms
Optimizer: Distributed Optimizations 0.773 ms
Optimizer: Enumerate Temporary Tables 0.032 ms
Optimizer: Singlebox Optimizations (Aggregator) 0.321 ms
PPC Check 0.428 ms
Generating query MPL 2.483 ms
Generating user function MPL 0.001 ms
MBC Emission 15.223 ms
Module Creation 67.186 ms
LLVM Bitcode Emission 22.250 ms
Module Cleaning 48.302 ms
LLVM Optimization 448.850 ms
Machine Code Generation 240.672 ms
Symbol Resolution 0.850 ms
total 887.595 ms

Compile query output suggests that the total compilation time is below 1 second, so I am wondering what is taking the rest 6 seconds.

That is because the compile command only measures the local compilation time on that node. This indicates that most of the compilation time is on the leaves.

Can you get the output of explain extended <query>? That will include the leaf queries. Then you may be able to measure their compile time by running compile on the leaf for those queries.

Another factor that sometimes is part of the difference between first and second run is data getting cached - if on the first run the new filter reads in a lot of cold data then that could also be part of the delta.

Hey @jack,

Here is my explain extended output:
WARNING: Statistics have not been collected on the following tables and columns. Consider running the following commands to collect them:
ANALYZE TABLE test.servicedesk_2 COLUMNS Number ENABLE;

See https://docs.memsql.com/docs/analyze for more information on statistics collection.

Project [remote_0.RegisteredOn, CAST(COALESCE($0,0) AS SIGNED) AS count]
Sort [STR_TO_DATE(remote_0.op_2,’%Y-%m-%d %H’)]
HashGroupBy [SUM(remote_0.count) AS $0] groups:[remote_0.op]
Gather partitions:all query:[SELECT STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS RegisteredOn, COUNT() AS count, STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS op, DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2 FROM test_0.servicedesk_2 as servicedesk WHERE ((servicedesk.ins_upd_ts > (‘2019-01-01’!:>datetime(6) NULL)) AND (NOT ISNULL(servicedesk.Number))) GROUP BY 3 OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)] alias:remote_0
Project [STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS RegisteredOn, count, STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS op, DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2]
HashGroupBy [COUNT(
) AS count] groups:[STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’)]
Filter [servicedesk.ins_upd_ts > (‘2019-01-01’!:>datetime(6) NULL) AND servicedesk.Number IS NOT NULL]
TableScan test.servicedesk_2 AS servicedesk, PRIMARY KEY (Id)

As for the “data being cached” situation that you suggested in the post:

  1. I am querying a view, which is built out of a main in-memory row-oriented table joined with a number of distributed dictionaries (which should live in-memory as well?). Could you please clarify what does it mean to cache the data which already resides in memory?
  2. The main underlying table of that view v_servicedesk is constantly getting updated, and on all subsequent query runs I’m seeing the updated results instantly . Am I right that if it was caching that took all that time on the initial run, then I would not have been able to see the updated data instantly, as they would have to be cached first?

Can you connect to a leaf node and run the query shown in the explain extended with compile:

compile SELECT STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS RegisteredOn, COUNT() AS count, STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS op, DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2 FROM test_0.servicedesk_2 as servicedesk WHERE ((servicedesk.ins_upd_ts > (‘2019-01-01’!:>datetime(6) NULL)) AND (NOT ISNULL(servicedesk.Number))) GROUP BY 3 OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)] alias:remote_0

That will give information on what is taking time for the leaf compilation.

Given that information your query is probably not affected by caching to any significant degree. It is probably just compilation time.

Thank you for the suggestion, Jack!
I could not run your command exactly as you have posted it, because MemSQL complained about two syntax errors:

  1. closing bracket here: INTERPRETER_MODE=INTERPRET_FIRST)]
  2. alias statement:
    alias:remote_0

So I have run it as follows:
compile SELECT STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS RegisteredOn,
COUNT(*) AS count, STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS op,
DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2 FROM test_0.servicedesk_2 as servicedesk
WHERE ((servicedesk.ins_upd_ts > (‘2019-01-01’!:>datetime(6) NULL)) AND (NOT ISNULL(servicedesk.Number)))
AND GroupModifiedOn > ‘2019-01-02’!:>datetime(6) NULL
GROUP BY 3 OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)

Here is the output:

Create Mbc::Context 0.269 ms
Expanding Views 0.036 ms
unknown 3.860 ms
Optimizer: Query Rewrites 0.526 ms
Optimizer: Setting Up Subselect Info 0.062 ms
Optimizer: Distributed Optimizations 0.001 ms
Optimizer: Enumerate Temporary Tables 0.021 ms
PPC Check 0.483 ms
Generating query MPL 1.363 ms
Generating user function MPL 0.002 ms
MBC Emission 8.668 ms
Module Creation 69.307 ms
LLVM Bitcode Emission 9.880 ms
Module Cleaning 36.639 ms
LLVM Optimization 269.340 ms
Machine Code Generation 132.630 ms
Symbol Resolution 0.922 ms
total 534.134 ms

I’ve also run it on the second MemSQL node (I have just 2 leaf nodes in this cluster), and here is the output:

Create Mbc::Context 0.159 ms
Expanding Views 0.013 ms
unknown 1.934 ms
Optimizer: Query Rewrites 0.277 ms
Optimizer: Setting Up Subselect Info 0.023 ms
Optimizer: Distributed Optimizations 0.000 ms
Optimizer: Enumerate Temporary Tables 0.007 ms
PPC Check 0.303 ms
Generating query MPL 0.623 ms
Generating user function MPL 0.001 ms
MBC Emission 4.674 ms
Module Creation 40.600 ms
LLVM Bitcode Emission 5.054 ms
Module Cleaning 19.492 ms
LLVM Optimization 126.578 ms
Machine Code Generation 71.135 ms
Symbol Resolution 0.439 ms
total 271.431 ms

Thanks, sorry for the copy/paste errors.

It seems like compilation itself is not taking very long (.9s + .5s) compared to the 7-10 s you observe. Are you running many other new queries at the same time, or just this one? Can you also measure the runtime of your original query with this extra clause at the end:

<query> OPTION(interpreter_mode=interpret)

what is the right way to remove compiled queries?
will it be drop all from plancache?

if so the difference between a query before/after the drop from plancache on my side is only 300ms.

Yes, drop all from plancache, on all nodes.

Jack,

The following query:
SELECT str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’) AS RegisteredOn, count(*) AS count
FROM v_servicedesk
where ins_upd_ts > ‘2019-01-01’
GROUP BY str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’)
ORDER BY str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’) ASC
OPTION(interpreter_mode=interpret)

Took 2.9secs to execute on the first run and 240 ms on all consecutive runs.

This table contains only 137k records, so I would not expect it to be even fully scanned in over 3 seconds.

Thanks for the information. That looks like the initial runtime/compilation time is again unusually high, much higher than the sum of the times from compile, which is strange.

What version of memsql are you running? (select @@memsql_version)

One more thing we can check: you could try running drop all from plancache on all nodes, and then running the original compile <query> command, and seeing if that is any different. There is one step that could potentially show up differently after clearing the plancache, the collecting table statistics steps, but it would be quite surprising if that accounted for the 5sec difference.

I am using MemSQL 6.8.10

Here is the output of the compile command run after drop all from plancache

Create Mbc::Context 0.242 ms
Expanding Views 1.451 ms
unknown 4.204 ms
Optimizer: Query Rewrites 2.303 ms
Optimizer: Collecting Table Statistics: Other 0.304 ms
Optimizer: Collecting Table Statistics: Analyze Statistics 2.391 ms
Optimizer: Collecting Table Statistics: Sampling 140.819 ms
Optimizer: Setting Up Subselect Info 0.158 ms
Optimizer: Distributed Optimizations 1.433 ms
Optimizer: Enumerate Temporary Tables 0.050 ms
Optimizer: Singlebox Optimizations (Aggregator) 0.554 ms
PPC Check 0.405 ms
Generating query MPL 1.788 ms
Generating user function MPL 0.001 ms
MBC Emission 12.758 ms
Module Creation 66.272 ms
LLVM Bitcode Emission 20.369 ms
Module Cleaning 40.674 ms
LLVM Optimization 368.033 ms
Machine Code Generation 208.446 ms
Symbol Resolution 0.953 ms
total 873.764 ms

Much like @zmeidav I observed a mere ~300ms increase in execution time after running drop all from plancache on all nodes.

Ok, that confirms that the issue isn’t in collecting table statistics.

The next thing I’d want to check is a profile of a first run compared to a profile of a later run.

Sure, here is my first profile run output:

PROFILE
WARNING: The PROFILE query requires plan compilation. The execution time may not be accurate.
Run PROFILE query again to get accurate execution time.
Project [remote_0.RegisteredOn, CAST(COALESCE($0,0) AS SIGNED) AS count] actual_rows: 300 exec_time: 0ms start_time: 00:00:02.205 network_traffic: 9.689000 KB network_time: 0ms
TopSort limit:[@@SESSION.sql_select_limit] [STR_TO_DATE(remote_0.op_2,?)] actual_rows: 300 exec_time: 30ms start_time: 00:00:02.176 memory_usage: 131.072006 KB
HashGroupBy [SUM(remote_0.count) AS $0] groups:[remote_0.op] actual_rows: 1,107 exec_time: 17ms start_time: 00:00:00.202 memory_usage: 524.288025 KB
Gather partitions:all alias:remote_0 actual_rows: 7,066 exec_time: 0ms start_time: 00:00:02.176 end_time: 00:00:02.176
Project [STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?) AS RegisteredOn, count, STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?) AS op, DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2] actual_rows: 7,066 exec_time: 18ms start_time: [00:00:00.197, 00:00:02.163] network_traffic: 706.599976 KB network_time: 1ms
HashGroupBy [COUNT(*) AS count] groups:[STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?)] actual_rows: 7,066 exec_time: 188ms start_time: [00:00:00.125, 00:00:01.917] memory_usage: 2,097.152100 KB
Filter [servicedesk.Number IS NOT NULL] actual_rows: 146,549 exec_time: 0ms
IndexRangeScan test.servicedesk_2 AS servicedesk, KEY ins_upd_ts (ins_upd_ts) scan:[ins_upd_ts > (?!:>datetime(6) NULL)] est_table_rows:192,867 est_filtered:79,082 actual_rows: 210,496 exec_time: 57ms start_time: [00:00:00.125, 00:00:01.916]

And second:

PROFILE
Project [remote_0.RegisteredOn, CAST(COALESCE($0,0) AS SIGNED) AS count] actual_rows: 300 exec_time: 0ms start_time: 00:00:00.125 network_traffic: 9.689000 KB network_time: 0ms
TopSort limit:[@@SESSION.sql_select_limit] [STR_TO_DATE(remote_0.op_2,?)] actual_rows: 300 exec_time: 29ms start_time: 00:00:00.096 memory_usage: 131.072006 KB
HashGroupBy [SUM(remote_0.count) AS $0] groups:[remote_0.op] actual_rows: 1,107 exec_time: 0ms start_time: 00:00:00.046 memory_usage: 524.288025 KB
Gather partitions:all alias:remote_0 actual_rows: 7,066 exec_time: 0ms start_time: 00:00:00.096 end_time: 00:00:00.096
Project [STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?) AS RegisteredOn, count, STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?) AS op, DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2] actual_rows: 7,066 exec_time: 17ms start_time: [00:00:00.042, 00:00:00.083] network_traffic: 706.599976 KB network_time: 1ms
HashGroupBy [COUNT(*) AS count] groups:[STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?)] actual_rows: 7,066 exec_time: 55ms start_time: [00:00:00.001, 00:00:00.009] memory_usage: 2,097.152100 KB
Filter [servicedesk.Number IS NOT NULL] actual_rows: 146,569 exec_time: 0ms
IndexRangeScan test.servicedesk_2 AS servicedesk, KEY ins_upd_ts (ins_upd_ts) scan:[ins_upd_ts > (?!:>datetime(6) NULL)] est_table_rows:192,867 est_filtered:79,082 actual_rows: 210,537 exec_time: 18ms start_time: [00:00:00.001, 00:00:00.009]

I have also performed profiling of a slightly more complex query, simply to observe the differences:
Query:
profile
SELECT str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’) AS RegisteredOn, count(*) AS count
FROM v_servicedesk
where ins_upd_ts > ‘2019-01-01’
and GroupName <> ‘’
and GroupModifiedOn > ‘2019-01-01’
and GroupName_Prev is null
and RegisteredOn > ‘2019-01-01’
GROUP BY str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’)
ORDER BY str_to_date(date_format(v_servicedesk.RegisteredOn, ‘%Y-%m-%d %H’), ‘%Y-%m-%d %H’) ASC

Show profile:
PROFILE
WARNING: The PROFILE query requires plan compilation. The execution time may not be accurate.
Run PROFILE query again to get accurate execution time.
Project [remote_0.RegisteredOn, CAST(COALESCE($0,0) AS SIGNED) AS count] actual_rows: 300 exec_time: 0ms start_time: 00:00:05.089 network_traffic: 9.776000 KB network_time: 0ms
TopSort limit:[@@SESSION.sql_select_limit] [STR_TO_DATE(remote_0.op_2,?)] actual_rows: 300 exec_time: 30ms start_time: 00:00:05.057 memory_usage: 131.072006 KB
HashGroupBy [SUM(remote_0.count) AS $0] groups:[remote_0.op] actual_rows: 1,044 exec_time: 17ms start_time: 00:00:00.268 memory_usage: 524.288025 KB
Gather partitions:all alias:remote_0 actual_rows: 6,968 exec_time: 0ms start_time: 00:00:05.057 end_time: 00:00:05.057
Project [STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?) AS RegisteredOn, count, STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?) AS op, DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2] actual_rows: 6,968 exec_time: 24ms start_time: [00:00:00.263, 00:00:05.043] network_traffic: 696.799988 KB network_time: 2ms
HashGroupBy [COUNT() AS count] groups:[STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?)] actual_rows: 6,968 exec_time: 220ms start_time: [00:00:00.095, 00:00:04.502] memory_usage: 2,097.152100 KB
Filter [sysadminunit_prev.name IS NULL] actual_rows: 136,615
HashJoin [sysadminunit_prev.id = servicedesk.GroupId_prev] type:left actual_rows: 136,615 exec_time: 21ms start_time: [00:00:00.095, 00:00:04.502]
|—HashTableBuild alias:sysadminunit_prev actual_rows: 20,696 exec_time: 0ms start_time: [00:00:00.091, 00:00:04.483] memory_usage: 4,194.304199 KB
| Project [sysadminunit_prev_0.id, sysadminunit_prev_0.name] est_rows:2,587 actual_rows: 20,696 exec_time: 4ms start_time: [00:00:00.086, 00:00:04.476]
| TableScan test.sysadminunit AS sysadminunit_prev_0, PRIMARY KEY (id) est_table_rows:2,587 est_filtered:2,587 actual_rows: 20,696 exec_time: 5ms start_time: [00:00:00.086, 00:00:04.476]
HashJoin [servicedesk.StatusId = casestatus__via__StatusId.id] type:left actual_rows: 137,583 exec_time: 61ms start_time: [00:00:00.095, 00:00:04.502]
|—HashTableBuild alias:casestatus__via__StatusId actual_rows: 224 exec_time: 0ms start_time: [00:00:00.091, 00:00:04.483] memory_usage: 1,048.576050 KB
| Project [casestatus__via__StatusId_0.id] est_rows:28 actual_rows: 224 exec_time: 1ms start_time: [00:00:00.091, 00:00:04.483]
| TableScan test.casestatus AS casestatus__via__StatusId_0, PRIMARY KEY (id) est_table_rows:28 est_filtered:28 actual_rows: 224 exec_time: 1ms start_time: [00:00:00.091, 00:00:04.483]
HashJoin [serviceitem__via__ServiceItemI.id = servicedesk.ServiceItemId] type:left actual_rows: 137,583 exec_time: 55ms start_time: [00:00:00.095, 00:00:04.502]
|—HashTableBuild alias:serviceitem__via__ServiceItemI actual_rows: 1,944 exec_time: 0ms start_time: [00:00:00.092, 00:00:04.483] memory_usage: 2,097.152100 KB
| Project [serviceitem__via__ServiceItemI_0.id] est_rows:243 actual_rows: 1,944 exec_time: 0ms start_time: [00:00:00.091, 00:00:04.483]
| TableScan test.serviceitem AS serviceitem__via__ServiceItemI_0, PRIMARY KEY (id) est_table_rows:243 est_filtered:243 actual_rows: 1,944 exec_time: 0ms start_time: [00:00:00.091, 00:00:04.483]
HashJoin [sysadminunit.id = servicedesk.GroupId] actual_rows: 137,583 exec_time: 90ms start_time: [00:00:00.095, 00:00:04.502]
|—HashTableBuild alias:sysadminunit actual_rows: 20,696 exec_time: 0ms start_time: [00:00:00.094, 00:00:04.501] memory_usage: 2,097.152100 KB
| Project [sysadminunit_0.id] est_rows:2,328 actual_rows: 20,696 exec_time: 2ms start_time: [00:00:00.092, 00:00:04.483]
| Filter [sysadminunit_0.name <> ?] actual_rows: 20,696 exec_time: 0ms start_time: [00:00:00.092, 00:00:04.483]
| TableScan test.sysadminunit AS sysadminunit_0, PRIMARY KEY (id) est_table_rows:2,587 est_filtered:2,329 actual_rows: 20,696 exec_time: 9ms start_time: [00:00:00.092, 00:00:04.483]
ChoosePlan
| :estimate
| SELECT COUNT(
) AS cost FROM test.servicedesk_2 WHERE servicedesk.GroupModifiedOn > (?!:>datetime(6) NULL)
| SELECT COUNT(*) AS cost FROM test.servicedesk_2 WHERE servicedesk.ins_upd_ts > (?!:>datetime(6) NULL)
|—Filter [servicedesk.ins_upd_ts > (?!:>datetime(6) NULL) AND servicedesk.RegisteredOn > (?!:>datetime(6) NULL) AND servicedesk.Number IS NOT NULL] actual_rows: 110,116 exec_time: 0ms
| IndexRangeScan test.servicedesk_2 AS servicedesk, KEY GroupModifiedOn (GroupModifiedOn) scan:[GroupModifiedOn > (?!:>datetime(6) NULL)] est_table_rows:192,867 est_filtered:133,989 actual_rows: 115,478 exec_time: 55ms start_time: [00:00:00.095, 00:00:04.484]
±–Filter [servicedesk.GroupModifiedOn > (?!:>datetime(6) NULL) AND servicedesk.RegisteredOn > (?!:>datetime(6) NULL) AND servicedesk.Number IS NOT NULL] actual_rows: 36,467 exec_time: 0ms
[actual_rows: 36,467 | max:18,363 at partition_4, average: 4,558.375000, std dev: 7,895.602604]
IndexRangeScan test.servicedesk_2 AS servicedesk, KEY ins_upd_ts (ins_upd_ts) scan:[ins_upd_ts > (?!:>datetime(6) NULL)] est_table_rows:192,867 est_filtered:133,989 actual_rows: 52,368 exec_time: 53ms start_time: [00:00:04.490, 00:00:04.501]
[actual_rows: 52,368 | max:26,305 at partition_4, average: 6,546.000000, std dev: 11,338.166000]

The very same query executes 5.5 seconds on the first run, and less than 200ms on all consequent runs.

One thing I noticed is that the “cycle” resets after ~5 minutes of not running the same query: the first run becomes very slow again.

Let me illustrate it with the screenshot below:

IndexRangeScan test.servicedesk_2 AS servicedesk, KEY ins_upd_ts (ins_upd_ts) scan:[ins_upd_ts > (?!:>datetime(6) NULL)] est_table_rows:192,867 est_filtered:79,082 actual_rows: 210,496 exec_time: 57ms start_time: [00:00:00.125, 00:00:01.916]

According to profile, the execution on one of the leaves starts much later than other leaves, which means it is either waiting for compilation or some lock.

Could you profile the first run of the query, and get “show profile json” result? So we can see which individual leaf is causing the problem.

@zhou, thank you the follow up!

Please find the output below:
(sorry for the long post, but the forum does not allow attaching any files but the images)

{
“profile”:[
{
“executor”:“Project”,
“out”:[
{
“alias”:"",
“projection”:“remote_0.RegisteredOn”
},
{
“alias”:“count”,
“projection”:“CAST(COALESCE($0,0) AS SIGNED)”
}
],
“subselects”:[],
“actual_row_count”:{ “value”:1081 },
“actual_total_time”:{ “value”:3 },
“start_time”:{ “value”:1761 },
“network_traffic”:{ “value”:35883 },
“network_time”:{ “value”:0 },
“inputs”:[
{
“executor”:“Sort”,
“order”:[
“STR_TO_DATE(remote_0.op_2,?)”
],
“actual_row_count”:{ “value”:1081 },
“actual_total_time”:{ “value”:58 },
“start_time”:{ “value”:1709 },
“memory_usage”:{ “value”:392880 },
“inputs”:[
{
“executor”:“HashGroupBy”,
“out”:[
{
“alias”:"$0",
“projection”:“SUM(remote_0.count)”
}
],
“groups”:[
“remote_0.op”
],
“encoded_group_by_unsupported”:“not a columnstore table: 0tmp”,
“encoded_join_unsupported”:“probe side is not an unordered scan of columnstore”,
“subselects”:[],
“actual_row_count”:{ “value”:1081 },
“actual_total_time”:{ “value”:20 },
“start_time”:{ “value”:335 },
“memory_usage”:{ “value”:524288 },
“inputs”:[
{
“executor”:“Gather”,
“partitions”:“all”,
“query”:“SELECT STRAIGHT_JOIN STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS RegisteredOn, COUNT() AS count, STR_TO_DATE( DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),’%Y-%m-%d %H’) AS op, DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’) AS op_2 FROM ((((test_0.servicedesk_2 as servicedesk LEFT JOIN ( SELECT WITH(NO_MERGE_THIS_SELECT=1) serviceitem__via__ServiceItemI_0.id AS id FROM test.serviceitem as serviceitem__via__ServiceItemI_0 ) AS serviceitem__via__ServiceItemI ON (servicedesk.ServiceItemId = serviceitem__via__ServiceItemI.id)) LEFT JOIN ( SELECT WITH(NO_MERGE_THIS_SELECT=1) casestatus__via__StatusId_0.id AS id FROM test.casestatus as casestatus__via__StatusId_0 ) AS casestatus__via__StatusId ON (servicedesk.StatusId = casestatus__via__StatusId.id)) LEFT JOIN ( SELECT WITH(NO_MERGE_THIS_SELECT=1) sysadminunit_0.id AS id, sysadminunit_0.name AS name FROM test.sysadminunit as sysadminunit_0 ) AS sysadminunit ON (sysadminunit.id = servicedesk.GroupId)) LEFT JOIN ( SELECT WITH(NO_MERGE_THIS_SELECT=1) sysadminunit_prev_0.id AS id, sysadminunit_prev_0.name AS name FROM test.sysadminunit as sysadminunit_prev_0 ) AS sysadminunit_prev ON (sysadminunit_prev.id = servicedesk.GroupId_prev)) WHERE ((servicedesk.ins_upd_ts > (‘2019-01-01’!:>datetime(6) NULL)) AND ( IFNULL(sysadminunit.name,’’) <> ‘’) AND (servicedesk.GroupModifiedOn > (‘2019-01-01’!:>datetime(6) NULL)) AND ISNULL(sysadminunit_prev.name) AND (NOT ISNULL(servicedesk.Number))) GROUP BY 3 OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)",
“alias”:“remote_0”,
“actual_row_count”:{ “value”:7173 },
“actual_total_time”:{ “value”:0 },
“start_time”:{ “value”:1709 },
“end_time”:{ “value”:1709 },
“inputs”:[
{
“executor”:“Project”,
“out”:[
{
“alias”:“RegisteredOn”,
“projection”:“STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?)”
},
{
“alias”:"",
“projection”:“count”
},
{
“alias”:“op”,
“projection”:“STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?)”
},
{
“alias”:“op_2”,
“projection”:“DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’)”
}
],
“subselects”:[],
“actual_row_count”:{ “value”:7173, “avg”:896.625000, “stddev”:6.632072, “max”:907, “maxPartition”:7 },
“actual_total_time”:{ “value”:22, “avg”:16.500000, “stddev”:5.500000, “max”:22, “maxPartition”:0 },
“start_time”:{ “value”:328, “avg”:964.500000, “stddev”:0.000000, “max”:1692, “maxPartition”:6 },
“network_traffic”:{ “value”:717300, “avg”:89662.500000, “stddev”:663.207170, “max”:90700, “maxPartition”:7 },
“network_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“inputs”:[
{
“executor”:“HashGroupBy”,
“out”:[
{
“alias”:“count”,
“projection”:"COUNT(
)”
}
],
“groups”:[
“STR_TO_DATE(DATE_FORMAT(servicedesk.RegisteredOn,’%Y-%m-%d %H’),?)”
],
“encoded_group_by_unsupported”:“at least one join does not meet requirements for push down to columnstore scan”,
“encoded_join_unsupported”:“probe side is not an unordered scan of columnstore”,
“subselects”:[],
“actual_row_count”:{ “value”:7173, “avg”:896.625000, “stddev”:6.632072, “max”:907, “maxPartition”:7 },
“actual_total_time”:{ “value”:220, “avg”:163.000000, “stddev”:57.000000, “max”:220, “maxPartition”:0 },
“start_time”:{ “value”:152, “avg”:609.375000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“memory_usage”:{ “value”:2097152, “avg”:262144.000000, “stddev”:0.000000, “max”:262144, “maxPartition”:0 },
“inputs”:[
{
“executor”:“Filter”,
“condition”:[
“sysadminunit_prev.name IS NULL”
],
“subselects”:[],
“actual_row_count”:{ “value”:140549, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“inputs”:[
{
“executor”:“HashJoin”,
“condition”:[
“sysadminunit_prev.id = servicedesk.GroupId_prev”
],
“type”:“left”,
“subselects”:[],
“actual_row_count”:{ “value”:140549, “avg”:17568.625000, “stddev”:104.816432, “max”:17694, “maxPartition”:3 },
“actual_total_time”:{ “value”:16, “avg”:11.000000, “stddev”:5.000000, “max”:16, “maxPartition”:0 },
“start_time”:{ “value”:152, “avg”:609.375000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“inputs”:[
{
“executor”:“HashTableBuild”,
“alias”:“sysadminunit_prev”,
“actual_row_count”:{ “value”:20696, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“start_time”:{ “value”:145, “avg”:599.000000, “stddev”:0.000000, “max”:1067, “maxPartition”:6 },
“memory_usage”:{ “value”:4194304, “avg”:524288.000000, “stddev”:0.000000, “max”:524288, “maxPartition”:0 },
“inputs”:[
{
“executor”:“Project”,
“out”:[
{
“alias”:"",
“projection”:“sysadminunit_prev_0.id”
},
{
“alias”:"",
“projection”:“sysadminunit_prev_0.name”
}
],
“est_rows”:“2587”,
“subselects”:[],
“actual_row_count”:{ “value”:20696, “avg”:2587.000000, “stddev”:0.000000, “max”:2587, “maxPartition”:0 },
“actual_total_time”:{ “value”:5, “avg”:3.000000, “stddev”:2.000000, “max”:5, “maxPartition”:0 },
“start_time”:{ “value”:138, “avg”:590.750000, “stddev”:0.000000, “max”:1049, “maxPartition”:6 },
“inputs”:[
{
“executor”:“TableScan”,
“db”:“test”,
“table”:“sysadminunit”,
“alias”:“sysadminunit_prev_0”,
“index”:“PRIMARY KEY (id)”,
“storage”:“lf_skiplist”,
“dir”:“forward”,
“est_table_rows”:“2587”,
“est_filtered”:“2587”,
“actual_row_count”:{ “value”:20696, “avg”:2587.000000, “stddev”:0.000000, “max”:2587, “maxPartition”:0 },
“actual_total_time”:{ “value”:5, “avg”:3.000000, “stddev”:2.000000, “max”:5, “maxPartition”:0 },
“start_time”:{ “value”:138, “avg”:590.750000, “stddev”:0.000000, “max”:1049, “maxPartition”:6 },
“inputs”:[]
}
]
}
]
},
{
“executor”:“Filter”,
“condition”:[
“IFNULL(sysadminunit.name,’’) <> ?”
],
“subselects”:[],
“actual_row_count”:{ “value”:141569, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“inputs”:[
{
“executor”:“HashJoin”,
“condition”:[
“sysadminunit.id = servicedesk.GroupId”
],
“type”:“left”,
“subselects”:[],
“actual_row_count”:{ “value”:141569, “avg”:17696.125000, “stddev”:98.369758, “max”:17822, “maxPartition”:3 },
“actual_total_time”:{ “value”:64, “avg”:44.500000, “stddev”:19.500000, “max”:64, “maxPartition”:0 },
“start_time”:{ “value”:152, “avg”:609.375000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“inputs”:[
{
“executor”:“HashTableBuild”,
“alias”:“sysadminunit”,
“actual_row_count”:{ “value”:20696, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“start_time”:{ “value”:151, “avg”:606.125000, “stddev”:0.000000, “max”:1073, “maxPartition”:6 },
“memory_usage”:{ “value”:4194304, “avg”:524288.000000, “stddev”:0.000000, “max”:524288, “maxPartition”:0 },
“inputs”:[
{
“executor”:“Project”,
“out”:[
{
“alias”:"",
“projection”:“sysadminunit_0.id”
},
{
“alias”:"",
“projection”:“sysadminunit_0.name”
}
],
“est_rows”:“2587”,
“subselects”:[],
“actual_row_count”:{ “value”:20696, “avg”:2587.000000, “stddev”:0.000000, “max”:2587, “maxPartition”:0 },
“actual_total_time”:{ “value”:4, “avg”:2.000000, “stddev”:2.000000, “max”:4, “maxPartition”:1 },
“start_time”:{ “value”:145, “avg”:599.250000, “stddev”:0.000000, “max”:1067, “maxPartition”:6 },
“inputs”:[
{
“executor”:“TableScan”,
“db”:“test”,
“table”:“sysadminunit”,
“alias”:“sysadminunit_0”,
“index”:“PRIMARY KEY (id)”,
“storage”:“lf_skiplist”,
“dir”:“forward”,
“est_table_rows”:“2587”,
“est_filtered”:“2587”,
“actual_row_count”:{ “value”:20696, “avg”:2587.000000, “stddev”:0.000000, “max”:2587, “maxPartition”:0 },
“actual_total_time”:{ “value”:6, “avg”:4.000000, “stddev”:2.000000, “max”:6, “maxPartition”:1 },
“start_time”:{ “value”:145, “avg”:599.250000, “stddev”:0.000000, “max”:1067, “maxPartition”:6 },
“inputs”:[]
}
]
}
]
},
{
“executor”:“HashJoin”,
“condition”:[
“servicedesk.StatusId = casestatus__via__StatusId.id”
],
“type”:“left”,
“subselects”:[],
“actual_row_count”:{ “value”:151133, “avg”:18891.625000, “stddev”:103.758057, “max”:19024, “maxPartition”:3 },
“actual_total_time”:{ “value”:48, “avg”:43.500000, “stddev”:4.500000, “max”:48, “maxPartition”:0 },
“start_time”:{ “value”:152, “avg”:609.375000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“inputs”:[
{
“executor”:“HashTableBuild”,
“alias”:“casestatus__via__StatusId”,
“actual_row_count”:{ “value”:224, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“start_time”:{ “value”:151, “avg”:606.250000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“memory_usage”:{ “value”:1048576, “avg”:131072.000000, “stddev”:0.000000, “max”:131072, “maxPartition”:0 },
“inputs”:[
{
“executor”:“Project”,
“out”:[
{
“alias”:"",
“projection”:“casestatus__via__StatusId_0.id”
}
],
“est_rows”:“28”,
“subselects”:[],
“actual_row_count”:{ “value”:224, “avg”:28.000000, “stddev”:0.000000, “max”:28, “maxPartition”:0 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“start_time”:{ “value”:151, “avg”:606.125000, “stddev”:0.000000, “max”:1073, “maxPartition”:6 },
“inputs”:[
{
“executor”:“TableScan”,
“db”:“test”,
“table”:“casestatus”,
“alias”:“casestatus__via__StatusId_0”,
“index”:“PRIMARY KEY (id)”,
“storage”:“lf_skiplist”,
“dir”:“forward”,
“est_table_rows”:“28”,
“est_filtered”:“28”,
“actual_row_count”:{ “value”:224, “avg”:28.000000, “stddev”:0.000000, “max”:28, “maxPartition”:0 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“start_time”:{ “value”:151, “avg”:606.125000, “stddev”:0.000000, “max”:1073, “maxPartition”:6 },
“inputs”:[]
}
]
}
]
},
{
“executor”:“HashJoin”,
“condition”:[
“serviceitem__via__ServiceItemI.id = servicedesk.ServiceItemId”
],
“type”:“left”,
“subselects”:[],
“actual_row_count”:{ “value”:151133, “avg”:18891.625000, “stddev”:103.758057, “max”:19024, “maxPartition”:3 },
“actual_total_time”:{ “value”:101, “avg”:66.000000, “stddev”:35.000000, “max”:101, “maxPartition”:0 },
“start_time”:{ “value”:152, “avg”:609.375000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“inputs”:[
{
“executor”:“HashTableBuild”,
“alias”:“serviceitem__via__ServiceItemI”,
“actual_row_count”:{ “value”:1944, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“start_time”:{ “value”:152, “avg”:609.375000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“memory_usage”:{ “value”:2097152, “avg”:262144.000000, “stddev”:0.000000, “max”:262144, “maxPartition”:0 },
“inputs”:[
{
“executor”:“Project”,
“out”:[
{
“alias”:"",
“projection”:“serviceitem__via__ServiceItemI_0.id”
}
],
“est_rows”:“243”,
“subselects”:[],
“actual_row_count”:{ “value”:1944, “avg”:243.000000, “stddev”:0.000000, “max”:243, “maxPartition”:0 },
“actual_total_time”:{ “value”:2, “avg”:1.000000, “stddev”:1.000000, “max”:2, “maxPartition”:0 },
“start_time”:{ “value”:151, “avg”:608.750000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“inputs”:[
{
“executor”:“TableScan”,
“db”:“test”,
“table”:“serviceitem”,
“alias”:“serviceitem__via__ServiceItemI_0”,
“index”:“PRIMARY KEY (id)”,
“storage”:“lf_skiplist”,
“dir”:“forward”,
“est_table_rows”:“243”,
“est_filtered”:“243”,
“actual_row_count”:{ “value”:1944, “avg”:243.000000, “stddev”:0.000000, “max”:243, “maxPartition”:0 },
“actual_total_time”:{ “value”:4, “avg”:2.000000, “stddev”:2.000000, “max”:4, “maxPartition”:0 },
“start_time”:{ “value”:151, “avg”:608.750000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“inputs”:[]
}
]
}
]
},
{
“executor”:“ChoosePlan”,
“estimate”:[
“SELECT COUNT() AS cost FROM test.servicedesk_2 WHERE servicedesk.GroupModifiedOn > (?!:>datetime(6) NULL)",
"SELECT COUNT(
) AS cost FROM test.servicedesk_2 WHERE servicedesk.ins_upd_ts > (?!:>datetime(6) NULL)”
],
“inputs”:[
{
“executor”:“Filter”,
“condition”:[
“servicedesk.ins_upd_ts > (?!:>datetime(6) NULL) AND servicedesk.Number IS NOT NULL”
],
“subselects”:[],
“actual_row_count”:{ “value”:132174, “avg”:16521.750000, “stddev”:6245.444555, “max”:19024, “maxPartition”:3 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“inputs”:[
{
“executor”:“IndexRangeScan”,
“db”:“test”,
“table”:“servicedesk_2”,
“alias”:“servicedesk”,
“index”:“KEY GroupModifiedOn (GroupModifiedOn)”,
“storage”:“lf_skiplist”,
“dir”:“forward”,
“scan”:[
“GroupModifiedOn > (?!:>datetime(6) NULL)”
],
“subselects”:[],
“est_table_rows”:“192867”,
“est_filtered”:“134989”,
“actual_row_count”:{ “value”:138526, “avg”:17315.750000, “stddev”:6545.463787, “max”:19897, “maxPartition”:5 },
“actual_total_time”:{ “value”:80, “avg”:68.000000, “stddev”:12.000000, “max”:80, “maxPartition”:0 },
“start_time”:{ “value”:152, “avg”:545.000000, “stddev”:0.000000, “max”:1074, “maxPartition”:6 },
“inputs”:[]
}
]
},
{
“executor”:“Filter”,
“condition”:[
“servicedesk.GroupModifiedOn > (?!:>datetime(6) NULL) AND servicedesk.Number IS NOT NULL”
],
“subselects”:[],
“actual_row_count”:{ “value”:18959, “avg”:2369.875000, “stddev”:6270.099888, “max”:18959, “maxPartition”:4 },
“actual_total_time”:{ “value”:0, “avg”:0.000000, “stddev”:0.000000, “max”:0, “maxPartition”:0 },
“inputs”:[
{
“executor”:“IndexRangeScan”,
“db”:“test”,
“table”:“servicedesk_2”,
“alias”:“servicedesk”,
“index”:“KEY ins_upd_ts (ins_upd_ts)”,
“storage”:“lf_skiplist”,
“dir”:“forward”,
“scan”:[
“ins_upd_ts > (?!:>datetime(6) NULL)”
],
“subselects”:[],
“est_table_rows”:“192867”,
“est_filtered”:“134989”,
“actual_row_count”:{ “value”:26922, “avg”:3365.250000, “stddev”:8903.614600, “max”:26922, “maxPartition”:4 },
“actual_total_time”:{ “value”:47, “avg”:23.500000, “stddev”:23.500000, “max”:47, “maxPartition”:0 },
“start_time”:{ “value”:1060, “avg”:1060.000000, “stddev”:0.000000, “max”:1060, “maxPartition”:4 },
“inputs”:[]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
]
}
],
“version”:“2”,
“warning”:“requires compile”,
“info”:{
“memsql_version”:“6.8.10”,
“memsql_version_hash”:“a53e479edccf4260eb01cdcf14a5c3d46e443da4”,
“num_online_leaves”:“2”,
“num_online_aggs”:“1”,
“context_database”:“test”
}
}

Dear @zhou, have you been able to identify the issue?

Dear @zhou, @jack , @rob,

Just wanted to let you know that the issue is still relevant.

Denis.

@dtroyan Sorry for the late response.
I examined the query. Unfortunately, interpFirst is not helping too much in this case. Because this query is generating one single plan the leaves so we won’t get the usual benefit of parallel compilation. In addition since all tables are rowstore, the interpreter will be a lot less efficient so we won’t do meaningful work until compilation finishes.
as a result, interpFirst won’t perform better than compile and run.

(for particular case, it might perform better if the tables are not sharded the same way).

Thanks again for the meaningful feedback. We will improve this case in later version of memSQL.