Load data in cluster with multiple hosts


#1

How to specify path of file for load data using file system in cluster with multiple hosts.
Master aggregator is in one host.
Leaves are in another host. While providing path of the file to be loading it shows error the specified file is not found.How to specify absolute path.


#2

The file used in LOAD DATA needs to be accessible on the aggregator the LOAD DATA command is run against. When using the LOCAL option, the file must be accessible to the client where the command is run.


#3

The path of file is accessible to master .But leaf shows error on start pipeline command execution.
The error is
ERROR 1934 ER_EXTRACTOR_EXTRACTOR_EXTRACT: Leaf Error (XX.XX.XX.XX:3307): Leaf Error (XX.XX.XX.XX:3306): Cannot extract for pipeline Example because ‘FAILURE’. Truncated stderr:file not found.
How to specify path in create pipeline command common to both master and leaves in different hosts?


#4

It looks like you are loading via file system pipelines. I had initially assumed you were loading with the load data command.

For filesystem pipelines, the filepath must be accessible from all nodes in the cluster. Typically, this is done with a distributed filesystem such as NFS. See https://docs.memsql.com/memsql-pipelines/v6.7/filesystem-pipelines-overview/#filesystem-paths-and-permissions for more info.

If you aren’t using a distributed filesystem but want to use filesystem pipelines you can also choose to have the pipeline run on the aggregator via CREATE AGGREGATOR PIPELINE. You could also use LOAD DATA in which I would expect to be performant than aggregator pipeline.