CPU usage of Master node reached 100% in cluster configuration


#1

The reproduction procedure is as follows.

  1. I built a MemSQL cluster with two leaf nodes on AWS according to the following document.
    Use a valid Enterprise License Key license for one month.
    As a result, one aggregator node, one master node, and two leaf nodes are activated.
    https://docs.memsql.com/guides/latest/install-memsql/cloud/aws/cf/step-1/

  2. Next, I constructed and populated the table with the following DDL and script.
    see DDL & script below.

  3. Finally, from the Java program, the following query was executed simultaneously from 50 threads at one second intervals.
    Then, the CPU load on the master node is nearly 100%.
    There was almost no CPU load on the aggregator node, and CPU load on the leaf node was around 10%.
    see query below.

Increasing the number of leaf nodes from 2 to 3 or 4 does not change this trend and does not improve performance. (Can not scale out)
Perhaps the Master node is the bottleneck.
Is there any solution?

==

  • DDL

    CREATE DATABASE verification;
    USE verification;

    CREATE TABLE Account (
    AccountID varchar(50) NOT NULL
    , AccountStructureCode varchar(50)
    , PRIMARY KEY (AccountID)
    );

    CREATE TABLE AccountStructure (
    AccountStructureCode varchar(50) NOT NULL
    , AccountStructureName varchar(230)
    , ParentAccountStructureCode varchar(50)
    , AccountDisplayPriority integer
    , AccountStructureLevel integer
    , PRIMARY KEY (AccountStructureCode)
    );

    CREATE TABLE TimeRange (
    TimeRangeType varchar(50) NOT NULL
    , LowRange numeric(7,4) NOT NULL
    , HighRange numeric(7,4) NOT NULL
    , Label varchar(50)
    , TimeRangeDisplayPriority integer
    , PRIMARY KEY (TimeRangeType, LowRange, HighRange)
    );

    CREATE TABLE Grid (
    AccountID varchar(50) NOT NULL
    , Currency varchar(50) NOT NULL
    , LargeCategoryType varchar(50) NOT NULL
    , MiddleCategoryType varchar(50) NOT NULL
    , TimeRange decimal(19,4) NOT NULL
    , Grid decimal(19,4) NOT NULL
    , SHARD KEY (AccountID) USING CLUSTERED COLUMNSTORE
    );

    CREATE VIEW FlattenedAccount AS
    SELECT
    b.AccountID AS L8
    , L7.AccountStructureCode AS L7
    , L6.AccountStructureCode AS L6
    , L5.AccountStructureCode AS L5
    , L4.AccountStructureCode AS L4
    , L3.AccountStructureCode AS L3
    , L2.AccountStructureCode AS L2
    , L2.ParentAccountStructureCode AS L1
    FROM
    Account AS b
    INNER JOIN AccountStructure AS L7
    ON b.AccountStructureCode = L7.AccountStructureCode
    INNER JOIN AccountStructure AS L6
    ON L7.ParentAccountStructureCode = L6.AccountStructureCode
    INNER JOIN AccountStructure AS L5
    ON L6.ParentAccountStructureCode = L5.AccountStructureCode
    INNER JOIN AccountStructure AS L4
    ON L5.ParentAccountStructureCode = L4.AccountStructureCode
    INNER JOIN AccountStructure AS L3
    ON L4.ParentAccountStructureCode = L3.AccountStructureCode
    INNER JOIN AccountStructure AS L2
    ON L3.ParentAccountStructureCode = L2.AccountStructureCode
    ;

    INSERT INTO TimeRange VALUES(β€˜Normal’, 0, 1,β€˜1Y’, 1);
    INSERT INTO TimeRange VALUES(β€˜Normal’, 1, 2,β€˜2Y’, 2);
    …

    INSERT INTO AccountStructure VALUES(β€˜Root’, β€˜NULL’, β€˜NULL’, NULL, 1);
    INSERT INTO AccountStructure VALUES(β€˜A1000’, β€˜NULL’, β€˜AK000’, NULL, 2);
    …

    INSERT INTO Account VALUES(β€˜100’, β€˜A1101111’);
    INSERT INTO Account VALUES(β€˜101’, β€˜A1101112’);
    …

  • script (grid.py)
    This script generates 768,000 rows.

    import random

    currencies = [β€˜JPY’, β€˜USD’, β€˜AUD’, β€˜NZD’, β€˜CAD’, β€˜HKD’, β€˜CNY’, β€˜SGD’, β€˜BRL’, β€˜ZAR’]
    accounts = [β€˜100’,β€˜101’,β€˜110’,β€˜111’,β€˜120’,β€˜121’,β€˜130’,β€˜131’,β€˜200’,β€˜201’,β€˜210’,β€˜211’,β€˜220’,β€˜221’,β€˜230’,β€˜231’,β€˜300’,β€˜301’,β€˜310’,β€˜311’,β€˜320’,β€˜321’,β€˜330’,β€˜331’,β€˜400’,β€˜401’,β€˜410’,β€˜411’,β€˜420’,β€˜421’,β€˜430’,β€˜431’,β€˜500’,β€˜501’,β€˜510’,β€˜511’,β€˜520’,β€˜521’,β€˜530’,β€˜531’,β€˜600’,β€˜601’,β€˜610’,β€˜611’,β€˜620’,β€˜621’,β€˜630’,β€˜631’,β€˜700’,β€˜701’,β€˜710’,β€˜711’,β€˜720’,β€˜721’,β€˜730’,β€˜731’,β€˜800’,β€˜801’,β€˜810’,β€˜811’,β€˜820’,β€˜821’,β€˜830’,β€˜831’]
    timeRanges = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]

    for i in range(10):
    largeCategoryType = β€˜LargeCategoryType’ + str(i+1)
    for j in range(4):
    middleCategoryType = β€˜MiddleCategoryType’ + str(i+1) + β€˜-’ + str(j+1)
    for k in range(len(currencies)):
    currency = currencies[k]
    for l in range(int(len(accounts)/2)):
    accountId = accounts[l*2+1]
    for m in range(len(timeRanges)):
    timeRange = timeRanges[m]
    value = random.random()
    print(’"’ + accountId + β€˜","’ + currency + β€˜","’ + largeCategoryType + β€˜","’ + middleCategoryType + β€˜",’ + str(timeRange) + β€˜,’ + str(value))

  • query

    SELECT
    grid.LargeCategoryType
    , grid.MiddleCategoryType
    , tr.Label
    , SUM(grid.Grid) as Grid
    FROM
    Grid grid
    INNER JOIN FlattenedAccount a
    ON grid.AccountID = a.L8
    INNER JOIN TimeRange tr
    ON grid.TimeRange >= tr.LowRange
    AND grid.TimeRange < tr.HighRange
    WHERE
    a.L1 = β€˜AK000’
    AND grid.Currency = β€˜JPY’
    AND tr.TimeRangeType = β€˜Normal’
    GROUP BY
    grid.LargeCategoryType
    , grid.MiddleCategoryType
    , tr.Label
    , tr.TimeRangeDisplayPriority
    ORDER BY
    grid.LargeCategoryType
    , grid.MiddleCategoryType
    , tr.TimeRangeDisplayPriority
    ;


#2

Which end point did you connect to? My suspicion is that you connected to the Master Aggregator endpoint. You only need to connect to that endpoint to do DDL (and a few other things). For general DML execution you want to use the load balancer (which will balance the load across the aggs in your cluster). The end points are listed in the CF window under Outputs. See the docs here: https://docs.memsql.com/guides/latest/install-memsql/cloud/aws/cf/step-2/ for a screenshot. It is titled AggregatorLoadBalancerEndpoint.

This would address why only the Master Agg is used. It won’t address why you are bottlenecking on the agg though.

Rick Negrin
VP Product Management at MemSQL

Rick


#3

Thank you for your reply.
As you pointed out, Java Client was connecting to Master Aggregator. I changed to connect to AggregatorLoadBalancerEndpoint, and now I can get the expected speed.