memSQL replication and memory issue

dkoutsou · July 31, 2020, 11:21am

Hey,

I have deployed a cluster using 1 master node and 7 leaf nodes. Here is the configuration.

I have created a database using the command CREATE DATABASE IF NOT EXISTS tpch; and then a table using the command

CREATE TABLE lineitem(
        l_orderkey INT,
        l_partkey INT,
        l_suppkey INT,
        l_linenumber INT,
        l_quantity INT,
        l_extendedprice INT,
        l_discount INT,
        l_tax INT,
        l_returnflag INT,
        l_linestatus INT,
        l_shipdate INT,
        l_commitdate INT,
        l_receiptdate INT,
        l_shipinstruct INT,
        l_shipmode INT,
        l_comment VARCHAR(44))

Finally I have imported the data by creating and executing the following pipeline (the total size of the data is 108GB).

CREATE PIPELINE lineitem
  AS LOAD DATA FS "/mnt/scratch/dkoutsou/dbgen/modified-dbgen/lineitem.*.parquet"
  INTO TABLE lineitem
  (
        l_orderkey <- l_orderkey,
        l_partkey <- l_partkey,
        l_suppkey <- l_suppkey,
        l_linenumber <- l_linenumber,
        l_quantity <- l_quantity,
        l_extendedprice <- l_extendedprice,
        l_discount <- l_discount,
        l_tax <- l_tax,
        l_returnflag <- l_returnflag,
        l_linestatus <- l_linestatus,
        l_shipdate <- l_shipdate,
        l_commitdate <- l_commitdate,
        l_receiptdate <- l_receiptdate,
        l_shipinstruct <- l_shipinstruct,
        l_shipmode <- l_shipmode,
        l_comment <- l_comment
  )
  FORMAT PARQUET
  START PIPELINE lineitem FOREGROUND;

As you can see in the first screenshot, my leaf nodes are using too much memory (~100GB) and therefore I can’t import any of the other database tables. I suspect that this is because of replication. If I see what happens in a leaf node, I have the following picture: !

Is there a way to disable replication or find the cause of this problem? I would expect every node to have around 50 GB of data and definitely not 100GB.

The problem is also visible here:

Based on memSQL statistics I should have had around 330GB of memory but I have almost double of that. Therefore there should be some way to disable the replication.

I would appreciate any answer!

Thank you very much for your time!

nikita · July 31, 2020, 12:44pm

You need to be creating columnstore table: SingleStoreDB Cloud · SingleStore Documentation. By default MemSQL creates rowstore in-memory tables. We will change the default table type in the short future.

dkoutsou · August 3, 2020, 5:01pm

Hey @nikita, thank you very much for your answer. This solves the problem. However with specifying at the table definition a KEY() USING CLUSTERED COLUMNSTORE, memSQL reads from disk now I guess. How about if I needed to read from memory? Would this be possible with my current data format?

nikita · August 17, 2020, 11:17pm

MemSQL caches everything in memory. Do you have a performance problem?