Incorrect Decimal values from Parquet source

Hi,

I am trying to dump some data from ORC files in AWS S3 to memsql through Pipeline.

The steps I follow:

  • save the ORC data as Parquet in another bucket
  • dump the parquet data through a pipeline in memsql

The issue I am facing is that decimal values don’t seem to be getting inserted properly.

The source ORC columns are also set as Decimal data type in Athena eg. (Decimal (18,4), Decimal(19,6)… )
And I have defined those columns as Decimal type with same length and precision in Memsql table as well.

The issue is that the values are incorrect in memsql. For eg. the values 10400.0000 (decimal(18,4)) in Athena is getting inserted as 104000000 in Memsql, although this column is also defined as decimal(18,4)

Also, If I try to see what values are coming in exactly and convert the memsql column to varchar, then for some decimal type columns (for eg. 35.000000 defined as decimal(19,6) in Orc athena ), its getting inserted in Memsql as some non-human readable characters as shown below:

image

What’s the output of select @@memsql_version?

It’s only with 7.0.14 that we added automatic conversion of parquet values with the decimal logical type annotation to a SQL decimal compatible format. With earlier versions, you’d get the raw underlying value (it can be either binary or integral, depending on how the underlying writer handles different precisions). If you’re on an earlier version, what you see is about what I’d expect.

Thanks for the reply,

The Version is 7.0.11

Could you suggest which functions can I use to convert it back ?
from Integer and binary both back to Decimal

I believe Sasha is recommending to upgrade to 7.0.14 and reload the data. Would this work for you?

Unfortunately upgrading is not an option