Unable to reach transform file

ssalgadom · March 1, 2019, 12:10pm

Hi,

I created a pipeline following the documentation:

CREATE PIPELINE my_pipeline
AS LOAD DATA HDFS 'hdfs://server-address:8020/folder'
WITH TRANSFORM('http://website.com/TransformFile.tar.gz','transformScript.py','')
INTO TABLE `created_table`
FIELDS TERMINATED BY ',';

When I create the pipeline no errors are given, the transformScript.py script is found, but when I run

test pipeline my_pipeline;

Gives me this error:

Truncated stderr: :No such file or directory

Have tried different approaches (with tar or calling directly the file), but it doesn’t reach the file. Is there any aditional permission I should gave?

PS: I can easily open http://website.com/TransformFile.tar.gz with any browser, from anywhere, I even set it to 777. Also from MemSQL I can reach that file, as the CREATE PIPELINE is created with no errors.

Thanks,

sschwarz · March 1, 2019, 6:02pm

Hi ssalgadom,

Thanks for your question.

Does your transformScript.py script try to access any files? If so, please check that your script can access these files.

Scott

ssalgadom · March 5, 2019, 9:59am

Thanks for your reply, and no, it doesnt access any file, just uses the stdin input, do some calculations with that data and generates stdout. Any other reason this might occur?

Regards,

JoYo · March 5, 2019, 6:23pm

Theres some interesting semantics of tar wrt the toplevel directory. How did you create your tarball. If it was with

tar czf TransformFile.tar.gz some_dir

then you need to make your entrypoint 'some_dir/transformScript.py', ei

create pipeline
...
with transform ('<url>', 'some_dir/transfoprmScript.py', '')