Pipeline with json_format_options

Franck · December 19, 2018, 1:29am

Hi,

As anyone created pipeline using json_format_options:
FORMAT JSON
( {col_name | @variable_name} <- subvalue_path [DEFAULT literal_expr], …)

I’m trying to import some json files (with different schema format) and parse it into a stored proc
s3 --> Pipeline --> SP --> table.

mxk · December 19, 2018, 10:05pm

It sounds like you looking for the CREATE PIPELINE ... INTO PROCEDURE procname FORMAT JSON ... syntax.

Do you have an example which is encountering an error?

sasha · December 19, 2018, 10:32pm

Yep, I have done this.

For starters, here’s a basic example using a JSON file system pipeline into a stored procedure:

MySQL [db]> create table t(a int, b text);
Query OK, 0 rows affected (0.05 sec)

MySQL [db]> delimiter //
MySQL [db]> create procedure json_ident(data_from_pipeline query(p_a int, p_b text)) as
→ begin
→ insert into t select p_a, p_b from data_from_pipeline;
→ end //
Query OK, 1 row affected (0.22 sec)

MySQL [db]> delimiter ;
MySQL [db]> create pipeline json_sp_pipe as
→ load data fs “/tmp/json_example”
→ into procedure json_ident
→ format json (p_a ← a, p_b ← b);
Query OK, 0 rows affected (0.40 sec)

MySQL [db]> start pipeline json_sp_pipe;
Query OK, 0 rows affected (0.36 sec)

MySQL [db]> select * from t;
±-----±-----+
| a | b |
±-----±-----+
| 1 | one |
| 2 | two |
±-----±-----+

FS pipelines watch directories for files to load. In this case, the directory had just one file:

~: cat /tmp/json_example/1.json                                                       
{"a":1, "b":"one"}
{"a":2, "b":"two"}

The full syntax for as load data ... format json is described on the LOAD DATA page:
https://docs.memsql.com/sql-reference/v6.7/load-data/#json-load-data

The DEFAULT option in the subvalue mapping clause is the recommended way to deal with minor schema differences among input records. If that doesn’t suffice, I’d be interested in hearing more details about your case.

Franck · December 20, 2018, 6:32pm

sasha:

MySQL [db]> create table t(a int, b text);
Query OK, 0 rows affected (0.05 sec)

MySQL [db]> delimiter //
MySQL [db]> create procedure json_ident(data_from_pipeline query(p_a int, p_b text)) as
-> begin
-> insert into t select p_a, p_b from data_from_pipeline;
-> end //
Query OK, 1 row affected (0.22 sec)

MySQL [db]> delimiter ;
MySQL [db]> create pipeline json_sp_pipe as
-> load data fs “/tmp/json_example”
-> into procedure json_ident
-> format json (p_a <- a, p_b <- b);
Query OK, 0 rows affected (0.40 sec)

MySQL [db]> start pipeline json_sp_pipe;
Query OK, 0 rows affected (0.36 sec)

MySQL [db]> select * from t;
±-----±-----+
| a | b |
±-----±-----+
| 1 | one |
| 2 | two |
±-----±-----+

FS pipelines watch directories for files to load. In this case, the directory had just one file:
~: cat /tmp/json_example/1.json                                                       
{"a":1, "b":"one"}
{"a":2, "b":"two"}

Sasha,
Thank you very much. I’ll try that.