Best UUID data type?


#1

Since MemSQL doesn’t support the UUID data type, what data type is best suited for UUID values?

I guess either BINARY(16) or VARBINARY(32)

Just like CockroachDB and Cloud Spanner, MemSQL will benefit by a random UUID rather than a optimised COMB version, right?

The image below shows benchmark for a single-node database. Apperantly, binary is only best for smaller data sets, so this makes me think a pure UUID version 4 stored in a VARBINARY(32) column is the best choice for many MemSQL use cases.


#2

What is the metric being measured on the chart you shared? Can you share more details of your benchmark? In line with your findings, I would expect VARBINARY or BINARY type to be best for storing UUID data.

CockroachDB and Cloud Spanner distribute data via range sharding. This means when using an optimized COMB UUID writes/reads of recent data would only hit a couple nodes. This is why random UUID’s work better since it prevents a specific node from getting overloaded. MemSQL distributes data via hash sharding and so doesn’t have this issue. As a result, from a data distribution perspective random UUID and COMB should be relatively equivalent.

From a storage perspective, I would expect COMB to be better in MemSQL since you will likely be indexing the UUID column in row store or sorting by it or a time dimension in columnstore. Since data would be arriving in a naturally sorted order, this would make it more efficient.


#3

Thanks @rob, the difference between CockroachDB/Cloud Spanner vs MemSQL is really useful. I wasn’t aware of that.

Sorry for not providing the source of the benchmark visualization. I also should have mentioned that the benchmark was performed by Spatie on a MySQL server. Source: spatie/laravel-binary-uuid

I guess I have to perform a benchmark on MemSQL to see how BINARY and VARBINARY compares to each other both for a tradional random UUID and a COMB version.

I really like how PostgreSQL simplies working with UUIDs with the special UUID data type. It removes the need for converting between bytes and HEX strings in application. The CONVERT function will probably make it a little eaiser. Any chance MemSQL will support UUID natively in the near future?


#4

UUID type is something we would like to do but haven’t committed to building yet.

Since UUID’s are fixed length, I think BINARY(16) would actually be better. This stackoverflow answer I think gives a pretty good strategy for inserting/querying data that would also be applicable to MemSQL.

If you do ultimately do some tests of your own, let us know how they go.


#5

I understand your priority. I heard a lot of great features are coming next year.

I did a simple benchmark on a single node MemSQL instance running cluster-in-a-box (master aggregator + leaf).

chart

Binary UUID: BINARY(16) with regular UUID version 4
Ordered UUID: BINARY(16) with ordered UUID version 1
Comb UUID: BINARY(16) with comb UUID version 4
Textual UUID: CHAR(36) with regular UUID version 4

Each run selects a random UUID from the dataset 10,000 times with index and calculates the average time.


#6

That’s very interesting that there is such a big difference in performance between the different types only in the 50k row dataset.

Could you share more details for the test setup? What was the hardware? Were the 10,000 queries being run concurrently or sequentially? What was the schema used for the test?