UTF-8mb4 support - or where are the emojis 😱

I recently saw that UTF-8mb4 support is a listed exception from MySQL. This means data with emojis (which our data set contains) returns an error on load.

Are there plans to support full UTF-8mb4, and as a result, emojis on the roadmap?

7 Likes

I too really misses this feature.

Our workaround is to escape Unicodes outside the range before inserting and reverse the process when fetching.

This is not on the near-term roadmap but we have heard about this and are considering it for a future release. :thinking: Thanks for the feedback!

2 Likes

mpskovvang how exactly do you do the escaping now?

Thanks Hanson!

The one workaround we’ve found that works is to encode the data in Base64 then decode on read in MemSQL.

@mpskovvang I see. Are you encoding the whole string or just the emoji part?

Sorry, a bit late…

I do only encode/decode the unsupported unicodes.

This actual works really great. The only real drawback is the byte size. I can even perform a FULLTEXT seach for emojies as long as I encode the query string first.

My Unicode class:

<?php
namespace App;

class Unicode
{
    public static function encode($string)
    {
        return preg_replace_callback('/[\x{FFFF}-\x{10FFFF}]+/u', function ($match) {
            return str_replace('"', '', json_encode($match[0]));
        }, $string);
    }

    public static function decode($string)
    {
        return preg_replace_callback('/(\\\u[0-9a-f]{4})+/', function ($match) {
            return json_decode('"' . $match[0] . '"');
        }, $string);
    }
}
4 Likes

Hi @nick-at and @mpskovvang

I am the PM currently working on looking at adding emoji (basically utf8mb4) support to MemSQL. Can you guys please email me so we can chat? Martin, I emailed you at your katoni.dk email by the way :slight_smile:

My email is jliang (at) memsql (dot) com

3 Likes

any expected ETA for support in UTF8mb4?

The project is funded and we will have a estimation soon.

Hi , nikita any update , we were about to start Memsql , but once we found lack of suck primary feature we decided to wait until you finish it , Hope you have a specific plan and deadline when it will be done , as I can see talk about this feature started a long time and no one could have a final answer , thanks

utf8mb4 support is planned for a MemSQL release targeted at later this year. Work should start on it shortly.

-Adam

1 Like

we are storing survey results in MemSQL, leveraging the pipeline feature to allow real time reporting (which works quite nicely) but we also came across this problem now because someone used an emoji in their response which led to the comment being cut off.
is there a release date for this feature? We need to discuss internally if we need to write a workaround but we wouldn’t do that if the fix is released in the next weeks.
Thanks!
Christoph

Hi Christoph,

We were a bit delayed on starting on this feature. The work is still in progress (its mostly written now), but it didn’t make the cut off for a release we plan to ship later this year. I can assure you it is coming though, but not within the next few weeks.

-Adam

1 Like

Any updates on this @adam? Our team is also excited for utf8mb4 support! :raised_hands:

3 Likes

It will ship in the next major release of Singlestore (it didn’t make it into 7.3). We are considering if we can backport it into 7.3 in a patch release early next year.

-Adam

1 Like

Thanks @adam that would be great!
At the moment I remove emojis from the strings to avoid this problem…

Hi @adam, there is one more team very excited waiting for full UT8 support!
Do you have news? :slight_smile:

Hi folks,

The feature will be in the next major release of SingleStore. Its looking too complex to backport to 7.3 at this point, so likely 7.5 will be the first release it will be available.

-Adam

1 Like

Thanks Adam for the update, do you have an idea when the 7.5 might be released please?

Also, is there a public roadmap available somewhere too?