I was reviewing our use of the Decimal data type in our columnstore table and noticed this bit in the MemSQL docs (https://docs.memsql.com/sql-reference/v6.7/datatypes/):
When creating decimal type columns that are aggregated in queries, for best performance, it is recommended to use precision of 18 digits or less if your application can accommodate this (i.e. the extra precision is not needed to represent meaningful information). Decimal values of 18 digits or less can be processed more efficiently than others because they can be handled internally as 64-bit integers during aggregation. Processing these 64-bit values takes much less time than interpreting decimal values of more than 18 digits, which must be handled in a more general way.
Currently we use
DECIMAL(20, 5). When I switch to
DECIMAL(18, 5) I notice two things:
- When we switch to
DECIMAL(18, 5), the disk usage for this column jumps by about 80% (80GB => 147GB)
- Performance of aggregations on the
DECIMAL(18, 5)seems to only be advantageous once I’m selecting enough rows (e.g. the entire table there’s a clear improvement in performance). With a small set of data being selected, performance seems to be better when using
My suspicion for the second point is that performance is better when using
DECIMAL(20, 5) because the data is compressed better and allows MemSQL to more quickly scan the data.
I was a little surprised to see the disk usage go up for the smaller decimal column type, but perhaps that has to do with the way in which MemSQL changes the underlying data representation.
Does this align with expectations?