Why large amounts of memory(around 100G) are refreshed (when loading bulk data using the kafka pipeline)

Hi !!

A bulk data load performance test was conducted using Kafka Pipeline.

Details are as follows:

*Test environment

CPU/memory: 24core (vCore 48)/320G

Node: MA 1 / Leaf 8

Database Partition: 24 * 8 = 192

Only 1 table(column store) with 16 columns.

Only 1 Kafka pipeline.

Only this operation. (No other processing on the system)

*About bulk data

Received from Kafka Cluster (partition equals database partition (192))

Produces 6M rows(messages) per second

1 row (message) size: approximately 175 bytes

*BI Tools: Prometheus + Grafana

Query(memory utilization):

node_memory_memTotal_bytes{instance=~“$node”} - node_memAvailable_bytes{instance=“$node”}

*** Question**

The attached image file is from Grafana which is showed the memory usage of one leaf node. as you can see, periodically the memory is refreshed to 30GB whenever memory usage reach to 130GB.
I want to know what is the reason of this.

Thank you.

Could you please share DDL for the table and pipeline definition?

CREATE TABLE sensor_data_table (
timestamp timestamp(6) NULL DEFAULT NULL,
name char(13) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
chid char(2) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
lotid char(15) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
ppid char(15) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
recipeid char(15) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
chstep char(15) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
stepseq char(13) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
partid char(15) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
status char(3) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
slotno tinyint(4) DEFAULT NULL,
sensor char(28) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
value double DEFAULT NULL,
lcl double DEFAULT NULL,
ucl double DEFAULT NULL,
inserttime timestamp(6) NULL DEFAULT CURRENT_TIMESTAMP(6),
/*!90618 SHARD / KEY name (name, sensor),
KEY inserttime (inserttime) /
!90619 USING CLUSTERED COLUMNSTORE /
) /
!90623 AUTOSTATS_CARDINALITY_MODE=INCREMENTAL, AUTOSTATS_HISTOGRAM_MODE=CREATE, AUTOSTATS_SAMPLING=ON / /!90623 SQL_MODE=‘STRICT_ALL_TABLES’ */;

CREATE PIPELINE quickstart_kafka10 AS LOAD DATA KAFKA ‘$p_kafka_ip/$p_kafka_topic’
BATCH_INTERVAL 1
MAX_PARTITIONS_PER_BATCH 192
INTO TABLE sensor_data_table
(timestamp, name, chid, lotid, ppid, recipeid, chstep, stepseq, partid, status, slotno, sensor, value, lcl, ucl)
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’;

sample data

2020-01-09 12:19:49.826,eq79941_0001,c1,z7qq,lzlry375kt_2h5v,luvv4n62vt36j,7s3m,w7s7xn5e,fsiainu6u7a3,run,1,kafkaExample.Sensor@73eabe71,3.9998652935028076,2.430484934435185,6.580886940578097
2020-01-09 12:19:49.826,eq79941_0001,c1,z7qq,lzlry375kt_2h5v,luvv4n62vt36j,7s3m,w7s7xn5e,fsiainu6u7a3,run,1,kafkaExample.Sensor@23f02a7d,226.59170532226562,150.84780190109996,306.62161731255765

Thanks you.

There is an in-memory segment for columnstore tables. When you stream into MemSQL data is first inserted into the in-memory segment. Once it reaches a particular threshold MemSQL flushed it to a columnstore segment and that’s where you see the memory usage drop.

MemSQL accounts of RAM usage and won’t run out of memory. If you are streaming data faster than MemSQL can flush it on disk it will start throttling writes. However you would have to stream at a very high volume.

Thank you for the answer.

There are still some questions.

It’s strange that the cycles are long (20 minutes +).
And,
The amount of memory flushed does not count.
(180byte x columnstore_segment_rows (1024000) or columnstore_flush_bytes (32MB))

Why? Should the huge 100G segment be stored in memory and flushed at once?

Thank you again in advance.

Maybe this is causing my out of memory issues… did you solve the problem somehow?