Invalid JSON value for column

I got this error from the Pipeline injection, but it can be reproduced here:

create table test_json_insert (raw_json JSON );

insert into test_json_insert (raw_json) values (
‘{“ts”:“2020-07-10T09:55:40Z”,“s_session_id”:"",“version”:“1.0.10.1”,“session”:“fdsdfs-c06f-4f62-833b-0e3b730bbeba”,“body”:{“type”:“game”,“name”:“àØãàðà àðÿ “,“id”:“95444”,“session”:“37f5ea28-720b-454c-a091-9bb772dc8814”,“room_id”:””,“number”:1}}’);

This is the Json:

{
“ts”: “2020-07-10T09:55:40Z”,
“s_session_id”: “”,
“version”: “1.0.10.1”,
“session”: “fdsdfs-c06f-4f62-833b-0e3b730bbeba”,
“body”: {
“type”: “game”,
“name”: "àØãàðà àðÿ ",
“id”: “95444”,
“session”: “37f5ea28-720b-454c-a091-9bb772dc8814”,
“room_id”: “”,
“number”: 1
}
}

I tried by using collate utf8_binary, utf8_general_ci, utf8_unicode_ci but nothing works.

Do you have some idea? I’m parsing the json in python, java, javascript and it’s valid in all of them.

Note that the query above has directional quotes / , but we don’t accept those as JSON string delimiters. However, I assume that’s a copy-paste issue. It’s probably not the only one, unfortunately, since that query succeeds on my end if I replace the directional quotes with " and '. Try uploading as preformatted text?

My guess is that the issue is with the name field. MemSQL doesn’t currently support the full unicode range inside JSON strings (https://docs.memsql.com/v7.1/concepts/json-guide/#unicode-support). Does the failing query succeed if you remove the “name” field from the input JSON?

If so, there’s a workaround available via encoding unsupported characters as “\u escape sequences”. We’re also currently working on adding built-in support for an upcoming release.

insert into test_json_insert (raw_json) values (
‘{“ts”:“2020-07-10T09:55:40Z”,“s_session_id”:"",“version”:“1.0.10.1”,“session”:“fdsdfs-c06f-4f62-833b-0e3b730bbeba”,“body”:{“type”:“game”,“name”:“àØãàðà àðÿ “,“id”:“95444”,“session”:“37f5ea28-720b-454c-a091-9bb772dc8814”,“room_id”:””,“number”:1}}’
)

I’m sorry, here is again. Thanks for your response. Your guess is right, the characters in the name field are causing the issue.
How Can I escape those fields? I’m ingesting this data via pipelines.

@sasha I’m trying to apply a jq transformation to solve this issue but looks like the transformer do not works
I’m adding the following to the transformation, but the output is an empty row and it fails.
It works fine and it does escape the characters using jq from command line. Am I missing something?

    WITH TRANSFORM ('memsql://json','', ' -a "."')

Here’s my test from command line:

 echo '{"test":"test","body":{"name":"raff obbyàíï"}}' | jq -a "."

Output:

{"test":"test","body":{"name":"raff obby\u00e0\u00ed\u00ef"}}

The pipeline works when I use raw_output (-r) but it doesn’t escape the characters

WITH TRANSFORM ('memsql://json','', ' -r ".|@json"')

When I change t -a, it stop working:

WITH TRANSFORM ('memsql://json','', ' -a ".|@json"')

Does this feature works?

1 Like

I had the same problem using jq here. Is there a workaround to do this?

@sasha I was able to resolve it using a python transformer:

#!/usr/bin/python3
import sys
for line in sys.stdin:
sys.stdout.write(line.encode( 'ascii' , 'ignore' ).decode( 'utf-8' )) 

Anyway, it would be useful to use the jq