Pipeline with File System

create table testtb
(
field1 int not null,
field2 varchar(55),
field3 varchar(256),
primary key (field1)
);

have data file :
1,“test11”,“testing pipeline data”
2,“test12”,“testing pipeline data”

Create or replace pipleline
as
load data FS '/path/to/files/<filename.csv>
INTO TABLE testtb;
FIELDS TERMINATED BY ‘,’
OPTIONALLY ENCLOSED BY ‘"’
(field1,
field2,
field3);

like in sql loader we have col1,col2, col3 , is there any way that I can ref the field for the data in the file, if I would like to apply filter on the file data instead of literals. Appreciate your help in providing the details around playing of file system data.

with regards,

I’m not sure I fully understand. However, two things come to mind:

  1. Use pipelines to stored procedures and filter out the data you don’t want in the stored procedure before it gets inserted into the target table
  2. Pre-process the input files with a perl script or something similar to filter them and create new files before you load them

Stored procedure is one of the options which I considered, but seeing that we have to pass on all columns along with their data type of the table to Query type, select from that query with required filters, and process it further.
using Python script or perl script for processing of very big files will have impact on our SAN and resource usage. Python will consume most of the resources If I would want to transform and filter it out.

I see that we can pass only QUERY Type parameter to stored proc thru pipeline.
if I have to perform migration from one env to other, if my table count is too many, I have to write a separate stored proc and create pipeline for each of the tables. Is there any way that I can have a wrapper and get the source_file from there I can get table name and pass columns of that table as query type and process it dynamically? This is going to provide a quick solution. By considering sql injection and other anomalies, I would not want to use Dynamic SQL.

Can you explain in more detail what you are trying to do? E.g. are you trying to load several tables from a source (non-MemSQL) database and filter out rows? Please give a more detailed example of what you are trying to do for one of the load sets (including example of a filter you want to apply).