How to parse encoded text into multiple rows

Hello everyone,

Is there a way to turn a string of key-value pairs into a table/result-set of key,value?
I have a (smallish - 3 columns, few hundred rows) table with one column that looks like “key1=value1;key1=value2;key2=value3” (short-ish - 3-4 keys)
I’d like to turn that into a result set that looks like (for each row of the source table):
| Pos | Key | Value |
| 1 | key1 | value1 |
| 2 | key1 | value2 |
| 3 | key2 | value3 |

(The possibility of repeated keys means we need the position in addition to the key)
(I’ve omitted e.g. the primary key of the source table, which I would also need to include, but that I’m guessing is straightforward.)

The source table is updated by a pipeline, so I would need to do this either as part of a procedure used by the pipeline, or something that periodically reads the source table and updates a second one.

Any suggestions on how to do this?

Thanks, and Regards,

  • Hari.

This can be done using stored procedure using While Loops (https://docs.memsql.com/sql-reference/v6.8/control-flow-statements/#while-loop-end-loop), LOCATE() built-in function (https://docs.memsql.com/sql-reference/v6.7/locate/) and SUBSTRING() built-in function (https://docs.memsql.com/sql-reference/v6.7/substring/).

Something like the following (I haven’t tested this code because I don’t have my laptop right now with me Hence the syntax might not be exactly correct)

DELIMITER //
CREATE PROCEDURE extract_from_row(str TEXT) 
RETURNS RECORD(res ARRAY(RECORD( key  TEXT, value TEXT)), count INT):
DECLARE
    len INT = CHARACTER_LENGTH(str);
    pos INT = 1;
    nextpos INT;
    found INT = 0;
    key TEXT;
    value TEXT;
    cur RECORD( key TEXT, value TEXT);
    res  ARRAY(RECORD( key TEXT, value TEXT)) = CREATE_ARRAY(10);
    ret RECORD(res ARRAY(RECORD( key TEXT, value TEXT)), count INT);
BEGIN
    WHILE pos != 0 and pos < len LOOP
        nextpos = LOCATE(str, '=', pos);
        key = SUBSTRING(str, pos, nextpos - 1);
        pos = nextpos + 1;
        nextpos = LOCATE(str, ';', pos);
        value = SUBSTRING(str, pos, nextpos - 1);
        pos = nextpos + 1;
        cur.key = key
        cur.value = value;
        res[found] = cur;
        found = found + 1;
    END LOOP;
    ret.res = res;
    ret.count = found;
    RETURN ret;
END //
DELIMITER ;

This is a very simple helper function to split the string into key - value pair. It doesn’t do exception handling in case the string is malformed (https://docs.memsql.com/sql-reference/v6.5/exceptions-and-exception-handling/). It also statically caps the max number of key-val pair to an arbitrary number (10). You can fix the latter by first figure out how many times does the character ‘;’ appears in the string in a separate helper function or set the limit to higher number than expected.

For list of all string built-in functions, refer to https://docs.memsql.com/sql-reference/v6.7/string-functions/

i have a pipe delimited string with filed enclosed in double quotes. 1 msg dropped from source into Kafka will contain 50k rows. How can this be parsed and upserted into table using procedure or direct table insert. Source is kafka pipeline.

Input Eg:
ACTIVITY|COMPANY|TRAN|ITEM|QTY|COST|DESC
“activity1”|“company1”|“tran1”|“item1”|“1”|“5”|" 2.5" pipe"
“activity2”|“company2”|“tran2”|“item2”|“3”|“1”|" 2.5" pipe"
“activity2”|“company3”|“tran3”|“item3”|“4”|“8”|“Wire,socket|pin”

TABLE: (Insert/Update)
ID DATE ACTIVITY COMPANY TRANID ITEM QTY COST TOT_COST DESC
1 5- May activity1 company1 tranid1 item1 1 5 5 2.5" pipe
2 5- May activity2 company2 tranid2 item2 3 1 3 2.5" pipe
3 5- May activity2 company3 tranid3 item3 5 8 40 Wire,socket|pin