CPU usage of Master node reached 100% in cluster configuration

takahiro.shigemoto · June 13, 2019, 6:50am

The reproduction procedure is as follows.

I built a MemSQL cluster with two leaf nodes on AWS according to the following document.
Use a valid Enterprise License Key license for one month.
As a result, one aggregator node, one master node, and two leaf nodes are activated.
SingleStoreDB Cloud · SingleStore Documentation
Next, I constructed and populated the table with the following DDL and script.
see DDL & script below.
Finally, from the Java program, the following query was executed simultaneously from 50 threads at one second intervals.
Then, the CPU load on the master node is nearly 100%.
There was almost no CPU load on the aggregator node, and CPU load on the leaf node was around 10%.
see query below.

Increasing the number of leaf nodes from 2 to 3 or 4 does not change this trend and does not improve performance. (Can not scale out)
Perhaps the Master node is the bottleneck.
Is there any solution?

==

DDL

CREATE DATABASE verification;
USE verification;

CREATE TABLE Account (
AccountID varchar(50) NOT NULL
, AccountStructureCode varchar(50)
, PRIMARY KEY (AccountID)
);

CREATE TABLE AccountStructure (
AccountStructureCode varchar(50) NOT NULL
, AccountStructureName varchar(230)
, ParentAccountStructureCode varchar(50)
, AccountDisplayPriority integer
, AccountStructureLevel integer
, PRIMARY KEY (AccountStructureCode)
);

CREATE TABLE TimeRange (
TimeRangeType varchar(50) NOT NULL
, LowRange numeric(7,4) NOT NULL
, HighRange numeric(7,4) NOT NULL
, Label varchar(50)
, TimeRangeDisplayPriority integer
, PRIMARY KEY (TimeRangeType, LowRange, HighRange)
);

CREATE TABLE Grid (
AccountID varchar(50) NOT NULL
, Currency varchar(50) NOT NULL
, LargeCategoryType varchar(50) NOT NULL
, MiddleCategoryType varchar(50) NOT NULL
, TimeRange decimal(19,4) NOT NULL
, Grid decimal(19,4) NOT NULL
, SHARD KEY (AccountID) USING CLUSTERED COLUMNSTORE
);

CREATE VIEW FlattenedAccount AS
SELECT
b.AccountID AS L8
, L7.AccountStructureCode AS L7
, L6.AccountStructureCode AS L6
, L5.AccountStructureCode AS L5
, L4.AccountStructureCode AS L4
, L3.AccountStructureCode AS L3
, L2.AccountStructureCode AS L2
, L2.ParentAccountStructureCode AS L1
FROM
Account AS b
INNER JOIN AccountStructure AS L7
ON b.AccountStructureCode = L7.AccountStructureCode
INNER JOIN AccountStructure AS L6
ON L7.ParentAccountStructureCode = L6.AccountStructureCode
INNER JOIN AccountStructure AS L5
ON L6.ParentAccountStructureCode = L5.AccountStructureCode
INNER JOIN AccountStructure AS L4
ON L5.ParentAccountStructureCode = L4.AccountStructureCode
INNER JOIN AccountStructure AS L3
ON L4.ParentAccountStructureCode = L3.AccountStructureCode
INNER JOIN AccountStructure AS L2
ON L3.ParentAccountStructureCode = L2.AccountStructureCode
;

INSERT INTO TimeRange VALUES(‘Normal’, 0, 1,‘1Y’, 1);
INSERT INTO TimeRange VALUES(‘Normal’, 1, 2,‘2Y’, 2);
…

INSERT INTO AccountStructure VALUES(‘Root’, ‘NULL’, ‘NULL’, NULL, 1);
INSERT INTO AccountStructure VALUES(‘A1000’, ‘NULL’, ‘AK000’, NULL, 2);
…

INSERT INTO Account VALUES(‘100’, ‘A1101111’);
INSERT INTO Account VALUES(‘101’, ‘A1101112’);
…
script (grid.py)
This script generates 768,000 rows.

import random

currencies = [‘JPY’, ‘USD’, ‘AUD’, ‘NZD’, ‘CAD’, ‘HKD’, ‘CNY’, ‘SGD’, ‘BRL’, ‘ZAR’]
accounts = [‘100’,‘101’,‘110’,‘111’,‘120’,‘121’,‘130’,‘131’,‘200’,‘201’,‘210’,‘211’,‘220’,‘221’,‘230’,‘231’,‘300’,‘301’,‘310’,‘311’,‘320’,‘321’,‘330’,‘331’,‘400’,‘401’,‘410’,‘411’,‘420’,‘421’,‘430’,‘431’,‘500’,‘501’,‘510’,‘511’,‘520’,‘521’,‘530’,‘531’,‘600’,‘601’,‘610’,‘611’,‘620’,‘621’,‘630’,‘631’,‘700’,‘701’,‘710’,‘711’,‘720’,‘721’,‘730’,‘731’,‘800’,‘801’,‘810’,‘811’,‘820’,‘821’,‘830’,‘831’]
timeRanges = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]

for i in range(10):
largeCategoryType = ‘LargeCategoryType’ + str(i+1)
for j in range(4):
middleCategoryType = ‘MiddleCategoryType’ + str(i+1) + ‘-’ + str(j+1)
for k in range(len(currencies)):
currency = currencies[k]
for l in range(int(len(accounts)/2)):
accountId = accounts[l*2+1]
for m in range(len(timeRanges)):
timeRange = timeRanges[m]
value = random.random()
print(‘"’ + accountId + ‘“,”’ + currency + ‘“,”’ + largeCategoryType + ‘“,”’ + middleCategoryType + ‘",’ + str(timeRange) + ‘,’ + str(value))
query

SELECT
grid.LargeCategoryType
, grid.MiddleCategoryType
, tr.Label
, SUM(grid.Grid) as Grid
FROM
Grid grid
INNER JOIN FlattenedAccount a
ON grid.AccountID = a.L8
INNER JOIN TimeRange tr
ON grid.TimeRange >= tr.LowRange
AND grid.TimeRange < tr.HighRange
WHERE
a.L1 = ‘AK000’
AND grid.Currency = ‘JPY’
AND tr.TimeRangeType = ‘Normal’
GROUP BY
grid.LargeCategoryType
, grid.MiddleCategoryType
, tr.Label
, tr.TimeRangeDisplayPriority
ORDER BY
grid.LargeCategoryType
, grid.MiddleCategoryType
, tr.TimeRangeDisplayPriority
;

rick · June 13, 2019, 4:28pm

Which end point did you connect to? My suspicion is that you connected to the Master Aggregator endpoint. You only need to connect to that endpoint to do DDL (and a few other things). For general DML execution you want to use the load balancer (which will balance the load across the aggs in your cluster). The end points are listed in the CF window under Outputs. See the docs here: SingleStoreDB Cloud · SingleStore Documentation for a screenshot. It is titled AggregatorLoadBalancerEndpoint.

This would address why only the Master Agg is used. It won’t address why you are bottlenecking on the agg though.

Rick Negrin
VP Product Management at MemSQL

Rick

takahiro.shigemoto · June 14, 2019, 5:10am

Thank you for your reply.
As you pointed out, Java Client was connecting to Master Aggregator. I changed to connect to AggregatorLoadBalancerEndpoint, and now I can get the expected speed.