Problem with 'maximum_memory' reached in query

I am running some experiments with MemSQL databases and in one of them an error occurs regarding the amount of memory of one of the leaves. The cluster memory capacity is:

1 master - 8GB
1 aggegator - 16GB
2 leaves - 16GB (each)

Cluster settings

+----+--------------+------+------+--------+--------------------+----------+---------------+---------------------+----------------+----------------------+--------------------+------------------------+--------+
| ID | IP_ADDR      | PORT | TYPE | STATE  | AVAILABILITY_GROUP | NUM_CPUS | MAX_MEMORY_MB | MAX_TABLE_MEMORY_MB | MEMORY_USED_MB | TABLE_MEMORY_USED_MB | TOTAL_DATA_DISK_MB | AVAILABLE_DATA_DISK_MB | UPTIME |
+----+--------------+------+------+--------+--------------------+----------+---------------+---------------------+----------------+----------------------+--------------------+------------------------+--------+
|  4 | 172.21.6.154 | 3306 | LEAF | online |                  1 |        4 |         15360 |               13824 |           2573 |                   97 |             461354 |                 458530 |   1383 |
|  3 | 172.21.6.153 | 3306 | LEAF | online |                  1 |        8 |         15360 |               13824 |           4969 |                   96 |             213515 |                 210501 |   1383 |
|  2 | 172.21.6.152 | 3306 | CA   | online |               NULL |        8 |         13928 |               12535 |            505 |                   34 |             213519 |                 210787 |   1378 |
|  1 | 172.21.6.151 | 3306 | MA   | online |               NULL |        8 |          6670 |                5646 |            643 |                   33 |             944686 |                 939192 |   1374 |
+----+--------------+------+------+--------+--------------------+----------+---------------+---------------------+----------------+----------------------+--------------------+------------------------+--------+

Query

select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit 
from   lineorder join dim_date on lo_orderdatekey = d_datekey join 
       customer on lo_custkey = c_customerkey join 
       supplier on lo_suppkey = s_suppkey join 
       part on lo_partkey = p_partkey 
where  c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') 
group by d_year, c_nation 
order by d_year, c_nation;

The explain

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Statistics have not been collected on the following tables and columns. Consider running the following commands to collect them:                                   |
|     ANALYZE TABLE ssb03.`customer` COLUMNS `C_Region` ENABLE;                                                                                                               |
|     ANALYZE TABLE ssb03.`part` COLUMNS `P_MFGR` ENABLE;                                                                                                                     |
|     ANALYZE TABLE ssb03.`supplier` COLUMNS `S_Region` ENABLE;                                                                                                               |
|                                                                                                                                                                             |
| See https://docs.memsql.com/docs/analyze for more information on statistics collection.                                                                                     |
|                                                                                                                                                                             |
| Project [dim_date.D_Year AS d_year, lineorder.C_Nation AS c_nation, profit]                                                                                                 |
| Sort [dim_date.D_Year, lineorder.C_Nation]                                                                                                                                  |
| HashGroupBy [SUM(lineorder.LO_Revenue - lineorder.LO_SupplyCost) AS profit] groups:[dim_date.D_Year, lineorder.C_Nation]                                                    |
| HashJoin [lineorder.LO_OrderDateKey = dim_date.D_DateKey]                                                                                                                   |
| |---HashTableBuild alias:dim_date                                                                                                                                           |
| |   Project [dim_date_0.D_Year, dim_date_0.D_DateKey] est_rows:2,556                                                                                                        |
| |   ColumnStoreScan ssb03.dim_date AS dim_date_0, KEY D_Year (D_Year) USING CLUSTERED COLUMNSTORE est_table_rows:2,556 est_filtered:2,556                                   |
| HashJoin [part.P_PartKey = lineorder.LO_PartKey]                                                                                                                            |
| |---HashTableBuild alias:part                                                                                                                                               |
| |   Project [part_0.P_PartKey] est_rows:38,000                                                                                                                              |
| |   Filter [part_0.P_MFGR = 'MFGR#1' OR part_0.P_MFGR = 'MFGR#2']                                                                                                           |
| |   ColumnStoreScan ssb03.part AS part_0, KEY P_Type (P_Type) USING CLUSTERED COLUMNSTORE est_table_rows:200,000 est_filtered:38,000                                        |
| TableScan 1tmp AS lineorder storage:list stream:yes est_table_rows:108,240                                                                                                  |
| Project [remote_0.LO_Revenue, remote_0.LO_SupplyCost, remote_0.LO_PartKey, remote_0.LO_OrderDateKey, remote_0.C_Nation] est_rows:108,240                                    |
| Gather partitions:all est_rows:108,240 alias:remote_0                                                                                                                       |
| Project [lineorder_0.LO_Revenue, lineorder_0.LO_SupplyCost, lineorder_0.LO_PartKey, lineorder_0.LO_OrderDateKey, customer.C_Nation] est_rows:108,240 est_select_cost:64,000 |
| HashJoin [lineorder_0.LO_CustKey = customer.C_CustomerKey]                                                                                                                  |
| |---HashTableBuild alias:customer                                                                                                                                           |
| |   Project [customer_0.C_Nation, customer_0.C_CustomerKey] est_rows:3,000                                                                                                  |
| |   Filter [customer_0.C_Region = 'AMERICA']                                                                                                                                |
| |   ColumnStoreScan ssb03.customer AS customer_0, KEY C_Nation (C_Nation) USING CLUSTERED COLUMNSTORE est_table_rows:30,000 est_filtered:3,000                              |
| HashJoin [supplier.S_SuppKey = lineorder_0.LO_SuppKey]                                                                                                                      |
| |---HashTableBuild alias:supplier                                                                                                                                           |
| |   Project [supplier_0.S_SuppKey] est_rows:1,000                                                                                                                           |
| |   Filter [supplier_0.S_Region = 'AMERICA']                                                                                                                                |
| |   ColumnStoreScan ssb03.supplier AS supplier_0, KEY S_Nation (S_Nation) USING CLUSTERED COLUMNSTORE est_table_rows:10,000 est_filtered:1,000                              |
| BloomFilter table:customer fields:lineorder_0.LO_CustKey                                                                                                                    |
| BloomFilter table:supplier fields:lineorder_0.LO_SuppKey                                                                                                                    |
| ColumnStoreScan ssb03.lineorder AS lineorder_0, KEY LO_OrderDateKey (LO_OrderDateKey) USING CLUSTERED COLUMNSTORE est_table_rows:6,001,171 est_filtered:6,001,171           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The error that appears.

(1712, Leaf Error (172.21.6.153:3306): Leaf Error (172.21.6.153:3306): Memory used by MemSQL (13727.50 Mb) has reached the 'maximum_memory' setting (13928 Mb) on this node. 
Possible causes include (1) available query execution memory has been used up for table memory (in use table memory: 164.50 Mb) and 
(2) the query is large and complex and requires more query execution memory than is available (in use query execution memory 12517.11 Mb). 
See https://docs.memsql.com/troubleshooting/latest/memory-errors for addit)

Any help outside increasing the amount of memory physically?

Would you mind collect profile for the failing query so we can find which component is using memory?

You can do it by first run “Profile query_text”; after it fails, run “show profile” and collect the results.

Could you help me understand this profile?

Profile

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PROFILE                                                                                                                                                                                                                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: The PROFILE query requires plan compilation. The execution time may not be accurate.                                                                                                                                                                                                                               |
| Run PROFILE query again to get accurate execution time.                                                                                                                                                                                                                                                                     |
| GatherMerge [remote_0.d_year, remote_0.c_nation] partitions:all est_rows:180,035,130,000 alias:remote_0 actual_rows: 0 exec_time: 0ms start_time: 00:00:00.151                                                                                                                                                              |
| Project [r0.d_year, r0.c_nation, profit] est_rows:180,035,130,000 actual_rows: 0 exec_time: 0ms network_time: 0ms                                                                                                                                                                                                           |
| Sort [r0.d_year, r0.c_nation] actual_rows: 0 exec_time: 0ms                                                                                                                                                                                                                                                                 |
| HashGroupBy [SUM(r0.profit) AS profit] groups:[r0.d_year, r0.c_nation] actual_rows: 0 exec_time: 0ms memory_usage: 2,097.152100 KB                                                                                                                                                                                          |
| TableScan r0 storage:list stream:yes  exec_time: 0ms                                                                                                                                                                                                                                                                        |
| Repartition [SUBQ_VWW_0.D_Year AS d_year, lineorder.C_Nation AS c_nation, profit] AS r0 shard_key:[d_year, c_nation] est_rows:180,035,130,000 est_select_cost:1,440,281,040,000 actual_rows: 0 exec_time: 0ms network_time: 0ms                                                                                             |
| ShuffleGroupBy [SUM(lineorder.LO_Revenue - lineorder.LO_SupplyCost) AS profit] groups:[SUBQ_VWW_0.D_Year, lineorder.C_Nation] actual_rows: 0 exec_time: 0ms memory_usage: 35,624.960938 KB                                                                                                                                  |
| HashJoin [lineorder.LO_PartKey = SUBQ_VWW_0.P_PartKey AND lineorder.LO_OrderDateKey = SUBQ_VWW_0.D_DateKey] actual_rows: 0 exec_time: 0ms                                                                                                                                                                                   |
| |---HashTableBuild alias:SUBQ_VWW_0 actual_rows: 394,283,498                                                                                                                                                                                                                                                                |
| |   Project [part.P_PartKey, dim_date.D_Year, dim_date.D_DateKey] actual_rows: 394,283,498 exec_time: 15,133ms start_time: [00:00:00.054, 00:00:00.173]                                                                                                                                                                     |
| |   NestedLoopJoin actual_rows: 394,283,496 exec_time: 0ms                                                                                                                                                                                                                                                                  |
| |   |---ColumnStoreScan ssb03.dim_date, KEY D_Year (D_Year) USING CLUSTERED COLUMNSTORE est_table_rows:2,556 est_filtered:2,556 actual_rows: 394,296,561 exec_time: 8,975ms start_time: [00:00:00.053, 00:00:00.172] memory_usage: 20,220,084.000000 KB segments_scanned: 0 segments_skipped: 0 segments_fully_contained: 0 |
| |   Filter [part.P_MFGR = ? OR part.P_MFGR = ?] actual_rows: 171,216 exec_time: 0ms start_time: [00:00:00.053, 00:00:00.172]                                                                                                                                                                                                |
| |   ColumnStoreScan ssb03.part, KEY P_Type (P_Type) USING CLUSTERED COLUMNSTORE est_table_rows:200,000 est_filtered:38,000 actual_rows: 425,984 exec_time: 11,205ms start_time: [00:00:00.053, 00:00:00.171] memory_usage: 4,194.304199 KB segments_scanned: 16 segments_skipped: 0 segments_fully_contained: 0             |
| TableScan 0tmp AS lineorder storage:list stream:yes est_table_rows:180,035,130,000  exec_time: 0ms                                                                                                                                                                                                                          |
| Project [lineorder_0.LO_Revenue, lineorder_0.LO_SupplyCost, lineorder_0.LO_PartKey, lineorder_0.LO_OrderDateKey, customer.C_Nation] est_rows:180,035,130,000                                                                                                                                                                |
| HashJoin [lineorder_0.LO_CustKey = customer.C_CustomerKey]                                                                                                                                                                                                                                                                  |
| |---HashTableBuild alias:customer                                                                                                                                                                                                                                                                                           |
| |   Project [customer_0.C_Nation, customer_0.C_CustomerKey] est_rows:3,000                                                                                                                                                                                                                                                  |
| |   Filter [customer_0.C_Region = ?]                                                                                                                                                                                                                                                                                        |
| |   ColumnStoreScan ssb03.customer AS customer_0, KEY C_Nation (C_Nation) USING CLUSTERED COLUMNSTORE est_table_rows:30,000 est_filtered:3,000                                                                                                                                                                              |
| HashJoin [supplier.S_SuppKey = lineorder_0.LO_SuppKey]                                                                                                                                                                                                                                                                      |
| |---HashTableBuild alias:supplier                                                                                                                                                                                                                                                                                           |
| |   Project [supplier_0.S_SuppKey] est_rows:1,000                                                                                                                                                                                                                                                                           |
| |   Filter [supplier_0.S_Region = ?]                                                                                                                                                                                                                                                                                        |
| |   ColumnStoreScan ssb03.supplier AS supplier_0, KEY S_Nation (S_Nation) USING CLUSTERED COLUMNSTORE est_table_rows:10,000 est_filtered:1,000                                                                                                                                                                              |
| BloomFilter table:customer fields:lineorder_0.LO_CustKey                                                                                                                                                                                                                                                                    |
| BloomFilter table:supplier fields:lineorder_0.LO_SuppKey                                                                                                                                                                                                                                                                    |
| ColumnStoreScan ssb03.lineorder AS lineorder_0, KEY LO_OrderDateKey (LO_OrderDateKey) USING CLUSTERED COLUMNSTORE est_table_rows:6,001,171 est_filtered:6,001,171                                                                                                                                                           |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I’m looking at the profile. It looks like at first you did not ‘analyze’ the table based on the warnings on your first post. MemSQL currently requires manually creating statistics, this is an important aspect of picking a good query plan. The new version 7.0 coming later this year will have automatic statistics. You run analyze by doing
ANALYZE TABLE .

Details at https://docs.memsql.com/sql-reference/v6.8/analyze/. It’s usually a good idea to do this after loading major amounts of data. Redo the profile after creating the statistics.

Thanks for the information about your cluster. When you say “master 8 gb, aggregator 16gb” I think that’s probably a master aggregator and child aggregator? For best performance, and especially when you have lots of data the child aggregator and master aggregator should both be 16 gb. Without the analyze stats, we have to guess at some details of the query, it’s harder to pick a good plan. If you are doing simple testing, you could also just run with only a master (because a failure is unlikely and you can save the cost of the extra machine for the child aggregator).

Is dim_date a reference table? Can you paste the DDL for it?

If it is a reference table, I believe I have found the cause of the problem - it’s an issue with a certain part of the logic for choosing join plans - and a workaround is to set max_subselect_aggregator_rowcount = 0 (this will disable the optimization that I think is causing the issue).