I have deployed a cluster using 1 master node and 7 leaf nodes. Here is the configuration.
I have created a database using the command
CREATE DATABASE IF NOT EXISTS tpch; and then a table using the command
CREATE TABLE lineitem( l_orderkey INT, l_partkey INT, l_suppkey INT, l_linenumber INT, l_quantity INT, l_extendedprice INT, l_discount INT, l_tax INT, l_returnflag INT, l_linestatus INT, l_shipdate INT, l_commitdate INT, l_receiptdate INT, l_shipinstruct INT, l_shipmode INT, l_comment VARCHAR(44))
Finally I have imported the data by creating and executing the following pipeline (the total size of the data is 108GB).
CREATE PIPELINE lineitem AS LOAD DATA FS "/mnt/scratch/dkoutsou/dbgen/modified-dbgen/lineitem.*.parquet" INTO TABLE lineitem ( l_orderkey <- l_orderkey, l_partkey <- l_partkey, l_suppkey <- l_suppkey, l_linenumber <- l_linenumber, l_quantity <- l_quantity, l_extendedprice <- l_extendedprice, l_discount <- l_discount, l_tax <- l_tax, l_returnflag <- l_returnflag, l_linestatus <- l_linestatus, l_shipdate <- l_shipdate, l_commitdate <- l_commitdate, l_receiptdate <- l_receiptdate, l_shipinstruct <- l_shipinstruct, l_shipmode <- l_shipmode, l_comment <- l_comment ) FORMAT PARQUET START PIPELINE lineitem FOREGROUND;
As you can see in the first screenshot, my leaf nodes are using too much memory (~100GB) and therefore I can’t import any of the other database tables. I suspect that this is because of replication. If I see what happens in a leaf node, I have the following picture: !
Is there a way to disable replication or find the cause of this problem? I would expect every node to have around 50 GB of data and definitely not 100GB.
The problem is also visible here:
Based on memSQL statistics I should have had around 330GB of memory but I have almost double of that. Therefore there should be some way to disable the replication.
I would appreciate any answer!
Thank you very much for your time!