Data type guidelines

What data type guidelines are there for MemSQL?

For example, the documentation here says, “If you don’t need automatic initialization or update support, it is recommended that you use the DATETIME or DATETIME(6) types instead.” Am I reading correctly that the first timestamp field will automatically initialize whether or not such a clause is specified?

When is it more efficient to use integers over equivalent decimals?

Are there different data types that shard the same? For example, if one table is sharded by an integer and another by a decimal, or one by a datetime and another by a timestamp, will the equivalent values exist in the same partitions?

Are there others?

Timestamp fields: If you explicitly specify a default (e.g. DEFAULT NULL) then the first timestamp field will not have the DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP behavior.

Integers are usually more efficient than decimals. For decimals that fit in an integer range, MemSQL is often able to optimize it to be almost as efficient as integer, but it depends on the type of operation and sizes of the data.

Most datatypes that are the same except as different size classes (e.g. int vs bigint, etc) shard the same. Decimal vs integer, and datetime vs timestamp, do not shard the same. You can check whether a specific pair of datatypes shards the same by creating two tables with them as shard keys, running explain on a query that joins them, and seeing if that join is local or distributed.

1 Like