Python3.x scripts are unable to execute in memsql pipeline transform


#1

Hi,
I was trying to run a Python 3.x script in pipeline using transform method, but it is not working and the pipeline is going to error state. I used the same sample script in memsql transform, but with python version pointed to 3.x. Could anybody help me on this ? Or is Python 3.x version are not supported in memsql transform method ? . any help would be highly appreciated.
Thanks.


#2

In general, any executable file is supported as a transform. And in particular, python 3 scripts are supported if the machine can run them.

The sample script is simply not compatible with python 3. If you query select * from information_schema.pipelines_errors after the failure, you should see python call stacks describing string-related issues.

What follows is a version of the sample skeleton transform which will work with both python 2 and python 3. We’ll also update the docs to include it.

#!/usr/bin/python

import struct
import sys

binary_stdin = sys.stdin if sys.version_info < (3, 0) else sys.stdin.buffer
binary_stderr = sys.stderr if sys.version_info < (3, 0) else sys.stderr.buffer
binary_stdout = sys.stdout if sys.version_info < (3, 0) else sys.stdout.buffer

def input_stream():
    """
        Consume STDIN and yield each record that is received from MemSQL
    """
    while True:
        byte_len = binary_stdin.read(8)
        if len(byte_len) == 8:
            byte_len = struct.unpack("L", byte_len)[0]
            result = binary_stdin.read(byte_len)
            yield result
        else:
            assert len(byte_len) == 0, byte_len
            return


def log(message):
    """
        Log an informational message to stderr which will show up in MemSQL in
        the event of transform failure.
    """
    binary_stderr.write(message + b"\n")


def emit(message):
    """
        Emit a record back to MemSQL by writing it to STDOUT.  The record
        should be formatted as JSON, Avro, or CSV as it will be parsed by
        LOAD DATA.
    """
    binary_stdout.write(message + b"\n")

log(b"Begin transform")

# We start the transform here by reading from the input_stream() iterator.
for data in input_stream():
    # Since this is an identity transform we just emit what we receive.
    emit(data)

log(b"End transform")