UTF-8mb4 support - or where are the emojis 😱


#1

I recently saw that UTF-8mb4 support is a listed exception from MySQL. This means data with emojis (which our data set contains) returns an error on load.

Are there plans to support full UTF-8mb4, and as a result, emojis on the roadmap?


#2

I too really misses this feature.

Our workaround is to escape Unicodes outside the range before inserting and reverse the process when fetching.


#3

This is not on the near-term roadmap but we have heard about this and are considering it for a future release. :thinking: Thanks for the feedback!


#4

mpskovvang how exactly do you do the escaping now?


#5

Thanks Hanson!

The one workaround we’ve found that works is to encode the data in Base64 then decode on read in MemSQL.


#6

@mpskovvang I see. Are you encoding the whole string or just the emoji part?


#7

Sorry, a bit late…

I do only encode/decode the unsupported unicodes.

This actual works really great. The only real drawback is the byte size. I can even perform a FULLTEXT seach for emojies as long as I encode the query string first.

My Unicode class:

<?php
namespace App;

class Unicode
{
    public static function encode($string)
    {
        return preg_replace_callback('/[\x{FFFF}-\x{10FFFF}]+/u', function ($match) {
            return str_replace('"', '', json_encode($match[0]));
        }, $string);
    }

    public static function decode($string)
    {
        return preg_replace_callback('/(\\\u[0-9a-f]{4})+/', function ($match) {
            return json_decode('"' . $match[0] . '"');
        }, $string);
    }
}