Columnstore Table Optimization, Are 50GB+ Tables Too Big to Compute vs. Multiple Smaller Tables?

So perhaps I’m not fully understanding how columnstore tables are stored… as I’m still wrapping my head around this while reading through the various docs & watching Ankur Goyal’s presentations on YouTube.

Let’s say I have “subscriber” data going back 5 years, well over 100M rows and several tens of GB in size. Shard keys include fields such as subscriberID, date, etc, data I’d definitely want to group together to fully utilize the shard key algorithm.

In a standard MS SQL Server or Postgres server, instead of storing all of this data in one large table, I’d break them apart by let’s say, 100K transactions per table (TABLE_TRANSACTIONS_100000, TABLE_TRANSACTIONS_200000), then just use a UNION to search & output data which may exist across multiple tables. Just having everything in 1 large table for MS SQL Server or Postgres would break my machine.

In MemSQL, would there be any computation issues just storing all of the data in one large table while running future JOINs / queries? I’m talking about 1 large columnstore table >= 50GB in size, growing over time.

Is this where MemSQL really shines, especially when properly assigned shard keys are utilized within columnstore data?

Yes. We have customers with close to trillions of rows within a single distributed columnstore table.