MemSQL 7.0 SingleStore issues

vkruoso · December 11, 2019, 5:36pm

Hi!

Im trying to use MemSQL 7.0 new features: compression (sparse columns) on Rowstore and hash index in ColumnStore.

I have a table with ~30Gb of data and in general it have lots of NULL values. Importing that data with the table configured to be sparse, I see virtually no compression.

Also, in columnstore tables (using the new hash index) running multiple DELETE/UPDATEs (all using the column from the hash index) I’m seeing a much better performance then 6.8, but still with a very high CPU usage for up to 100 concurrent transactions.

Im not sure if Im doing something wrong. Do you have any thoughts on what I can investigate to debug those tables?

Thanks,
Vinicius

adam · December 11, 2019, 6:25pm

Can you share your table schema’s with us and for the columnstore index case some example update/delete queries?

vkruoso · December 11, 2019, 9:07pm

Hi @adam,
Sure.

The schema is the same for both scenarios (im trying to use less memory or have a good performance with columnstore):

CREATE TABLE `company` (
    `id` varchar(14) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
    `tipo` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
    `nome` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `fantasia` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `email` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `telefone` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `logradouro` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `numero` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `complemento` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `bairro` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `municipio` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `uf` varchar(2) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `cep` varchar(8) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `abertura` date DEFAULT NULL,
    `natureza_juridica` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `situacao` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `data_situacao` date DEFAULT NULL,
    `situacao_especial` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `motivo_situacao_especial` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `data_situacao_especial` date DEFAULT NULL,
    `capital_social` decimal(30,2) DEFAULT NULL,
    `efr` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
    `ultima_atualizacao` datetime NOT NULL,
    `extra` JSON COLLATE utf8_bin DEFAULT NULL,
    `rand` float DEFAULT NULL,
   SHARD KEY `id` (`id`) /*!90619 USING CLUSTERED COLUMNSTORE */,
   KEY `company_idx` (`id`) USING HASH
) /*!90623 AUTOSTATS_CARDINALITY_MODE=INCREMENTAL, AUTOSTATS_HISTOGRAM_MODE=OFF, AUTOSTATS_SAMPLING=OFF */ /*!90623 SQL_MODE='STRICT_ALL_TABLES' */;

For the rowstore table, is the same structure, with the sparse flag for all columns. If you want I can determine the percentage of NULLs for each column, but as I did not saw any compression, I might be doing something wrong.

Delete use the id field in the where clause, and updates too, but all fields are updated.

This table have 42M rows, but in the same transaction we do similar operation to other two tables, one with 54M and other with 24M rows. Using the profiler I see more CPU usage on the update of the company table).

evan · December 12, 2019, 12:48am

Hey Vinicius,

The show table status memory command may be useful here. With 90% nulls on all eligible sparse columns, and 0 as the nonnull value inserted, I’m seeing

memsql> show table status memory\G
*************************** 1. row ***************************
                          Name: s
                          Rows: 1000000
              Total Memory Use: 198198832
                   Row Structs: 132382720
Allocated Offrow Struct Memory: 17816112
Requested Offrow Struct Memory: 14400000
   Deleted Version Row Structs: 0
                       HTNodes: 0
                       Indexes: 32000000
            Variable Allocator: 33816112
*************************** 2. row ***************************
                          Name: t
                          Rows: 1000000
              Total Memory Use: 313027584
                   Row Structs: 265027584
Allocated Offrow Struct Memory: NULL
Requested Offrow Struct Memory: NULL
   Deleted Version Row Structs: 0
                       HTNodes: 0
                       Indexes: 32000000
            Variable Allocator: 16000000

which is about a 1.58 compression ratio. I’m interested what the command reports to you. Inserting larger varchar values will move this ratio closer to 1.

CREATE TABLE `t` (
  `id` varchar(14) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `tipo` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `nome` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `fantasia` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `email` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `telefone` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `logradouro` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `numero` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `complemento` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `bairro` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `municipio` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `uf` varchar(2) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `cep` varchar(8) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `abertura` date DEFAULT NULL,
  `natureza_juridica` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `situacao` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `data_situacao` date DEFAULT NULL,
  `situacao_especial` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `motivo_situacao_especial` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `data_situacao_especial` date DEFAULT NULL,
  `capital_social` decimal(30,2) DEFAULT NULL,
  `efr` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `ultima_atualizacao` datetime NOT NULL,
  `extra` JSON COLLATE utf8_bin DEFAULT NULL,
  `rand` float DEFAULT NULL
  /*!90618 , SHARD KEY () */ 
) /*!90623 AUTOSTATS_CARDINALITY_MODE=OFF, AUTOSTATS_HISTOGRAM_MODE=OFF */ /*!90623 SQL_MODE='STRICT_ALL_TABLES' */

CREATE TABLE `s` (
  `id` varchar(14) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `tipo` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `nome` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `fantasia` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `email` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `telefone` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `logradouro` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `numero` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `complemento` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `bairro` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `municipio` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `uf` varchar(2) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `cep` varchar(8) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `abertura` date DEFAULT NULL SPARSE,
  `natureza_juridica` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `situacao` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `data_situacao` date DEFAULT NULL SPARSE,
  `situacao_especial` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `motivo_situacao_especial` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `data_situacao_especial` date DEFAULT NULL SPARSE,
  `capital_social` decimal(30,2) DEFAULT NULL,
  `efr` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL SPARSE,
  `ultima_atualizacao` datetime NOT NULL,
  `extra` JSON COLLATE utf8_bin DEFAULT NULL,
  `rand` float DEFAULT NULL SPARSE
  /*!90618 , SHARD KEY () */ 
) /*!90623 AUTOSTATS_CARDINALITY_MODE=OFF, AUTOSTATS_HISTOGRAM_MODE=OFF */ /*!90623 SQL_MODE='STRICT_ALL_TABLES' */

evan · December 12, 2019, 1:29am

Ah, so if I insert a 255-length string for all non null varchar(255), I’m seeing

*************************** 1. row ***************************
                          Name: s
                          Rows: 1000000
              Total Memory Use: 947124328
                   Row Structs: 132513792
Allocated Offrow Struct Memory: 17810536
Requested Offrow Struct Memory: 14400000
   Deleted Version Row Structs: 0
                       HTNodes: 0
                       Indexes: 32000000
            Variable Allocator: 782610536
*************************** 2. row ***************************
                          Name: t
                          Rows: 1000000
              Total Memory Use: 1061696512
                   Row Structs: 264896512
Allocated Offrow Struct Memory: NULL
Requested Offrow Struct Memory: NULL
   Deleted Version Row Structs: 0
                       HTNodes: 0
                       Indexes: 32000000
            Variable Allocator: 764800000

which is a compression ratio of about 1.12. Looks like those varchars use about 80% of s's memory.

>>> (782610536 - 17810536) / 947124328.0
0.8074969435269326

+----------+
| count(*) |
+----------+
|   100000 |
+----------+
1 row in set (0.91 sec)

memsql> select count(*) from s where nome is null;
+----------+
| count(*) |
+----------+
|   900000 |
+----------+
1 row in set (0.19 sec)

memsql> select nome from s where nome is not null limit 1;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nome                                                                                                                                                                                                                                                            |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Put differently, with this dataset, the VariableAllocator allocations go from constituting about 72% of table memory to 80% once sparse is enabled.

0.720356515591529
>>> 782610536 - 17810536
764800000
>>> _ / 947124328.0
0.8074969435269326
>>>

nikita · December 12, 2019, 4:15am

As for the CPU on hash indexes for columnstore… Deletes need to move the row into rowstore and you have a reasonably wide row with lots of strings so its going to use CPU.

We will continue working on singlestore performance.

vkruoso · December 24, 2019, 1:58pm

Thanks @nikita and @evan for the answers. Sorry for taking too long to reply.

I did update our production cluster to 7.0 and saw a slight smaller memory footprint (somewhere around 96Gb to 92Gb). I was expecting a bigger reduction due to the number of NULLs we have on the bigger tables.

Does the compression works better when there are multiple NULLs in a sequence of columns, or that does not influences it?

Also looking forward to the next releases towards the SingleStore vision. Columnstore updates are much better compared to 6.8, but for us at a 100/s delete/updates the CPU usage is just to high right now.

nikita · December 24, 2019, 8:44pm

Can you share your benchmark for Singlestore and what performance would be acceptable for you. We will work on it for the next release

evan · December 27, 2019, 7:17pm

Sparse compression works per-row, so the order of rows in the table does not affect compression.

I’m interested to see what show table status memory gives you for both sparse and nonsparse versions of your table

vkruoso · February 14, 2020, 6:11pm

Really sorry for taking that long to reply.

This is the current show table status memory for that table.

*************************** 10. row ***************************
                          Name: company
                          Rows: 43223903
              Total Memory Use: 35945223152
                   Row Structs: 8868593664
Allocated Offrow Struct Memory: 1882843232
Requested Offrow Struct Memory: 1870821816
   Deleted Version Row Structs: 12713984
                       HTNodes: 0
                       Indexes: 5532659584
            Variable Allocator: 21531255920

If you want I can create the table again without the sparse columns to compare.

Im really not able to read that output properly, so I dont have a clue if sparse columns are being beneficial at this moment.

I have now changed the cluster to be non-HA (or HA-1) and that improved the performance, and we have a lot more memory to work with that we did not had with HA-2. I was able to get 500/s+ transactions in those memory tables. I did not had the chance to re-run the workload using Columnstore.