What is best practice for CLUSTERED COLUMNSTORE keys


I’m using columnar fact tables for our star schemas, with dimensions as in-memory tables.

Typically, if a fact table contains a time dimension, I use it as the key. But this is more a heuristic I developed and I’m not quite sure what is best practice here.

What’s the right way to decide what the key(s) for a columnar table should be, and in what order should they be declared?

Here’s an example table definition:

CREATE TABLE fact_survey_submissions (
  -- Time
  `timestamp` DATETIME NOT NULL,
  -- Foreign Keys
  account_id VARCHAR(32) NOT NULL,
  user_id VARCHAR(32) NOT NULL,
  response_id VARCHAR(32) NOT NULL,
  survey_id VARCHAR(32) NOT NULL,

  -- Measures
  survey_rating TINYINT,



https://docs.memsql.com/tutorials/v6.7/optimizing-table-data-structures/#choosing-a-columnstore-key has a lot of good advice on this question. (The first part is background on what the key is, the advice starts with “The single most important consideration for columnstore keys”…)


This is unrelated to your sort key decision, but I would recommend using INT or BIGINT for your foreign key columns assuming you will be joining them to dimension tables. Joins on integer columns are faster than on strings.