How to decrease master node's memory cache, or make a balance between master node and leaf node about memory cache?

Hi,

After I create a memory partition table, I excute the below command to load data into a memsql table:

memsql> load data infile ‘/data1/boray_poc_data/data/pur_tcn_datareport_daily/correct_r_PUR_TCN_DATAREPORT_DAILY.csv’ into table PUR_TCN_DATAREPORT_DAILY columns terminated by ‘,’;
Query OK, 212009169 rows affected (33 min 48.00 sec)

I find that during load data procedure, the cache of memory-usage on master node increases very fast. Almost all memory is used as cache.


In this way, when excuting a complicated SQL query which contains multi-table “left join”, memsql will return a “Not enough memory available” error.

So are there any methods to decrease master node’s memory cache, or in another way, make a balance between master node and leaf node?

The table structure is as below:

create table PUR_TCN_DATAREPORT_DAILY
(
ticket_dt DATE,
flt_dpt_dt DATE,
flt_nbr VARCHAR(10),
carrier_cd VARCHAR(2),
orig_airport_cd VARCHAR(3),
dest_airport_cd VARCHAR(3),
nation_flag VARCHAR(10),
sub_class_cd VARCHAR(1),
class_cd VARCHAR(1),
ticket_agent_nbr VARCHAR(10),
agent_name VARCHAR(100),
agent_area_name VARCHAR(30),
agent_branch_name VARCHAR(60),
agent_branch_name_airport VARCHAR(60),
manager_name VARCHAR(60),
channel_name VARCHAR(30),
pax_qty INTEGER,
pax_income DECIMAL,
agent_commi_amt DECIMAL,
yq_amt DECIMAL,
z_commi_amt DECIMAL,
ka_qty INTEGER,
ka_amt DECIMAL,
ka_agent_commi_amt DECIMAL,
ka_z_commi_amt DECIMAL,
ka_yq_amt DECIMAL,
refund_pax_qty INTEGER,
refund_pax_income DECIMAL,
refund_commi_amt DECIMAL,
refund_yq_amt DECIMAL,
refund_no_pax_qty INTEGER,
refund_no_pax_income DECIMAL,
refund_no_commi_amt DECIMAL,
refund_no_yq_amt DECIMAL,
data_source VARCHAR(100),
src_sysname VARCHAR(10),
src_table VARCHAR(30),
etl_job VARCHAR(50),
etl_tx_dt DATE,
etl_first_dt DATE,
etl_proc_dt DATE,
key(TICKET_AGENT_NBR,SUB_CLASS_CD,PAX_INCOME,DATA_SOURCE,FLT_DPT_DT,FLT_NBR,CARRIER_CD,ORIG_AIRPORT_CD,DEST_AIRPORT_CD,TICKET_DT,CLASS_CD)
);

Do you need to seek on multiple access paths extremely fast into this data? If not, trying making the table a columnstore.

Yes, my purpose is to get extremely fast access to the data in memsql, in order to improve the performance of complex SQL queries, which contains 8 “left join” for all the 9 tables. In this way, for all these 9 tables, I use memory table (which is rowstore tables).
In this procedure, after finishing loading data into all 9 tables, I find that the cache of memory-usage on memsql master node increases very fast. Almost all memory is used as cache. In this way, when I run complex SQL queries, memsql will return errors like “our of memory usage in XXXX leaf node.”
Are there any methords to solve this problem? Thanks!

A few things come to mind. It is not clear to me that your largest table needs to support fast seeking on multiple different columns. So it still might be a good idea to make it a columnstore. If you could post your top few queries, that could give an idea about how to best design your tables.

Here’s some potentially useful documentation:

Also, consider reducing maxiumum_table_memory to leave enough room at runtime for query execution.

Finally, consider adding more memory to your leaf nodes and letting MemSQL use it.

Thanks so much for the very prompt reply!!! These documentations are all very useful, it helps me a lot to understand the storage engine of MEMSQL. By the way, in my experiment, I designed all tables(9 tables as below) as rowstore tables. And got the best query performance.
T1 : 212,009,169 lines data , 61GB
T2 : 11,064,615 lines data ,2.8GB
T3 : 71,101 lines data ,22MB
T4 : 72,973 lines data ,2.8MB
T5 : 2,413 lines data ,384KB
T6 : 1,415 lines data ,280KB
T7 : 1,002 lines data ,164KB
T8 : 106 lines data ,20KB
T9 : 20 lines data ,4.0KB
My memsql cluster has 4 nodes, each node has 256GB memory, about 200GB is available.