Key vs Shard Key

What is the difference between :
create table test1 (
cid char(40),
num bigint,
key (cid) using clustered columnstore
);
create table test2 (
cid char(40),
num bigint,
shard key (cid) using clustered columnstore
);
is the “key” in test1 related to physical distribution of the data?
The sharding is done in partition level or row segment level?

That’s interesting. Although I knew they could consist of the same field(s), I had no idea that shard and columnstore keys could be defined as a single key. I imagine that it has the same effect as separate keys. The shard key defines how the data is distributed among the database partitions. The columnstore key defines how the data is sorted within a row segment group.

1 Like

If you run show create table test1; or show create table test1 extended;, you’ll likely find the difference.

key ('xyz') using clustered columnstore only refers to the sorting of data once on disk in columnar format.

shard key ('xyz') only refers to the physical distribution of data amongst database partitions (all rows where xyz=1 are guaranteed to be in the same partition). If you don’t explicitly name a shard key, memsql will use the primary key (if exists-only for rowstore), or otherwise memsql will “keyless shard” your data, aka shard key (), choosing a partition for each row at random.

Yes, it looks like you can combine the syntax like shard key (cid) using clustered columnstore -> which would define both your shard key and your sort order.

The shard key and sort key can definitely consist of the same fields, as @bvincent suggests - however the use cases are usually different for each , i.e. sorting on fields that may be time-based, vs sharding on userId, customerId, some unique identifier for joins to other tables and to prevent data skew. For some good heuristics, see this doc for full understanding: https://docs.memsql.com/tutorials/v6.8/optimizing-table-data-structures/

2 Likes

The shard key is either an indexed field or indexed compound fields that exists in every document in the collection. MongoDB partitions data in the collection using ranges of shard key values. Each range defines a non-overlapping range of shard key values and is associated with a chunk