Columnstore JSON type fetch performance

zmeidav · July 7, 2019, 1:49pm

Hey,

I’m well aware that the fetch of a JSON column is troublesome and could take longer than a regular column since it is using an optimized and different storage engine, however the numbers I saw do not add up.
I have a JSON column with well over 40 fields in the json data blob.

Fetching 1 field out of the json column:
select payload::columnX from … limit 1;
takes 15-20 ms.

Fetching 4 fields out of the json column:
select payload::column1, payload::column2, payload::column3, payload::column4 from … limit 1;
takes about 70-80ms.

Fetching 40 fields out of the json column - takes about 650-700ms.

So i checked the profiled the query using the memsql studio - during the 650-700ms period, the cpu was only on 6% cpu usage. which would be equivalent to roughly 60 ms of cpu time.

It makes me wonder whether the underlying implementation is using multiple threads to fetch from the JSON storage, or linearly column by column, and the delay is basically disk access that doesn’t get paralleled ?

haoran · July 8, 2019, 3:45am

We use a 3rd party library called parquet to store JSON column and extract json keys.

The query execution is single-threaded (per partition), same as the other data types. The thread is responsible for extracting all the requested data in that partition. It does this block (i.e. 4096 rows in one segment) by block, and column by column in each block. There is not much fancy things there. So I would expect run time grow linearly with the number of projections you have in your query.

I am not sure where the 650-700ms is spent. You need to collect a flamegraph to know exactly what’s going on.

6% CPU utilization sounds suspicious, but I wouldn’t 100% trust the studio, especially for such short-running query (it runs for only 0.7s according to what you said). If you could confirm the CPU usage by logging into the leaf directly we can investigate further by collecting a flamegraph.

If your query’s working set fits into memory, a easy way to get rid of disk-related complication is to run the identical query twice. The second run should not require any hard disk read, as the data should have been in Linux page cache.

zmeidav · July 8, 2019, 6:39am

Hey,

All the numbers here are provided after running the same query for 5-10 consecutive times.

haoran · July 8, 2019, 6:53am

I don’t see your runtime being suspicious by itself, as in your specific case, the runtime is supposed to grow linearly with the number of projections you have. Your one-projection query takes 15-20ms, so your 40-projection query should take 600-800ms, which is indeed what you get.

The only suspicious point is your claim that the CPU usage is 6%. That’s why I asked you to confirm this by directly logging into the leaf to check. I don’t have much knowledge about studio, but I don’t think it is sampling CPU usage at a high frequency, so likely it’s not going to show you short peaks of CPU usage correctly.

I suggest you use a stored procedure (or simply a SQL script by copy-pasting your query 100 times) to repeatedly run your query, then log into the leaf and run htop to get real CPU usage.

zmeidav · July 8, 2019, 8:30pm

Usually the parquet format is meant to be used with parallel column processing.

I created a file with a bunch of these queries.
Each query takes about 800 ms, and total run for the script running them one after another is about 20 seconds, during which, the cpu does indeed spike to 100-110% cpu.

Also it is the same query used consecutive time, and still takes almost a whole second, where is the caching in this case?

haoran · July 8, 2019, 10:42pm

I agree with you this could have been better. But unfortunately we just haven’t implemented that at this time.

Just to confirm, you logged in into the leaf, and you found that the sum of CPU usage of all CPUs is 100-110% CPU, which is how you concluded that on average there is only 6% usage per CPU. Is my understanding correct?

I agree with you that if the CPU usage is indeed 6% then it’s very suspicious. We need to collect off-cpu flamegraph to know where it is blocking on. But I just want to make sure that we didn’t hit a config issue before doing that (for example, you checked the aggregator CPU usage instead of the leaf, or somehow you interpret the CPU usage wrongly).

zmeidav · July 9, 2019, 8:05am

Yes, when running the query the CPU goes from 15-20% to 100-110%. the machine has 16 cores.

So the 6% usage, is just a memsql studio mistake.

The question is, why is there no caching and why is the same query taking the same amount time of 800ms one after another. On regular columns, the same query would take 10-20ms, even if there are many selected.

haoran · July 9, 2019, 9:16am

I don’t understand how CPU usage can be higher than 100%, but as long as the CPU is fully utilized while the query is running, things should be fine.

This is kind of tech detail, but in case you are interested: we first compute a list of table columns needed to compute all your projections. For example, if your table has integer column a,b,c and you do select a+1, a+b, b+1, b+2, then column a and b are needed, but c is not. We extract the data from column a and b, then compute all your projections. Even if a and b are used multiple times in your projections, they are only extracted from disk once.

However, things are a little different for JSON key extraction. Each JSON key you asked for is considered as a special “fake JSON column” in your table. This means if you select js::a and js::b, even if they are both on JSON column js, they are internally considered as two different “fake JSON columns” for the purpose of MemSQL query execution. So column js is extracted twice. This is not ideal of course, but unfortunately this is currently how the code works.

zmeidav · July 9, 2019, 10:46am

In linux systems, each 100% cpu utilization in the TOP is 1 core used. if you see 800% it will be 8 cores being used. so in this case it was using 1 core out of 16.

If you ever plan product wise, that memsql will be a drop in replacement for document storage etc, this is something that must be handled, otherwise very poor performance.

Can you please explain what happens if I simply select “JS” without extracting anything ? This one also has a poor performance.

haoran · July 9, 2019, 10:55pm

That’s actually problematic. We have one thread per partition, and usually multiple partitions reside on a single node (and also multiple nodes on a physical server if your server has multiple NUMA nodes). So it’s wired that you are only getting a total of 100% CPU utilization (you are supposed to get 100% x #partitions on that leaf). How many partitions are on the leaf you are checking? (try attach the output of show partitions)

Probably you should talk to a PM about your use case. @hanson

parquet only needs to extract a part of the JSON blob if you just want to read a key. So if you are reading js::a and js::b, only a part of the JSON blob needs to be parsed and extracted. But if you select js, the whole blob has to be parsed and extracted and hand back to you. That’s probably why it’s slower.

zmeidav · July 10, 2019, 7:56pm

As part of the query predicates I use the sharding key.

haoran · July 11, 2019, 12:00am

Then it sounds reasonable.

moralez.rodrigo · September 15, 2021, 9:36pm

Hey guys, we have a similar problem with the JSON type column. When we pull data from the others columns in the same table, the query runs fast. But when we select the JSON type, it just take a lot of time to get the result. Even if we select this field only. Is there any tip or something to speed up this?

zmeidav · September 16, 2021, 12:37am

Is your json data very complex?
From my experience you should use the JSON type if you don’t access it frequently or if it is very simple json object. Otherwise this column will be slow to fetch but offer really great compression due to the internal handling of the json type.

If you fetch it often, try to keep it as a text / blob field type which will not offer the same great compression but will do give you better performance fetching it, and also you can use json functions on it.

moralez.rodrigo · September 16, 2021, 3:53pm

Yes, it´s a big json. We use it to parse it and persist the values into column. But we would also want to select this field, the problem is that is very slow to access. That column has 300GB of size, so moving it to a text would help but the we will consume lot of storage.

So the problem might be the compression. Is there some way to change the compression setting to a field?

cwatts · September 16, 2021, 7:57pm

Are you selecting the entire JSON field or extracting values for some key(s)?

moralez.rodrigo · September 16, 2021, 8:38pm

Just selecting the JSON field. We need to do that to show the entire json in this particular case.

cwatts · September 17, 2021, 7:53pm

Are there selective filters in the query (how many rows are you selecting the JSON field for vs the total rowcount of the table)? Parquet, the JSON storage format we use, is fundamentally more targeted towards key extraction rather than full projection.

If the query contains selective filters - you may be running into a limitation of our storage engine which requires decoding the full segment (default 1 million rows) for the JSON field in order to read the JSON field for any row. We are aiming to optimize this to only decode the required rows. You can mitigate this issue by changing the type of the column to a text-based type, like zmeidav suggested. It will likely hurt the compression ratio. Again, optimizing this case is on our roadmap.