Our challenge : What is the best way to load large amount of data (about 20 million records, 18GB memsql size) thru FS pipelines without making the table unavailable for read
We load about 30 tables into memsql daily night from a NAS drive by running the pipelines in foreground. During this load we make the entire cluster unavailable and switch the user traffic to other memsql cluster that we have. During this load, we truncate the entire table and reload the entire data. Once we finish loading all the tables, we make the cluster active again and follow the same process for the second cluster.
But we have another set of tables that receive data at various times after the initial load. We need to load these tables also in its entirety.
We developed a pipeline which will write into a SP and SP will check the value for each row and update if the row is already there or insert if the row is not there. Since SP process one row at a time,it takes a long time to process.
Is there any better way of handling this ? Our requirment is that the table should be available for read when we are loading the data.