MEMSQL TOOLS

New Environment Validation Checks in MemSQL Tools

Roxanna Pourzand

You can now validate the hardware and software environment for MemSQL, before you install the software. As you may already know, the MemSQL-Report module within MemSQL Tools allows you to run health checks on your database. The new functionality in MemSQL-Report lets you run environment validation checks – before you install MemSQL. These checks ensure that machines are appropriately configured for the best performance of the database, and the least likelihood of future problems.

MemSQL is a high-performance database; it ingests data rapidly, and supports workloads that require fast transaction processing, low query latency, and high concurrency. (Which includes lots of simultaneous users, including individual SQL queries, business intelligence tools, applications, and machine learning models.) 

You may have heard the database analogy that compares a fast database to a racecar. At MemSQL, we like this comparison. To confirm your race car is ready to perform, you need to ensure that the foundation of the car – the configuration of the environment that your database will run in – is in good shape. 

The Importance of Configuration for a Distributed Database

The world of distributed databases includes a large number of databases that run best on a single machine, or that require very careful configuration and management to “scale out” in a limited fashion. There is also a small number of newer, relational databases that are distributed, which are referred to as NewSQL databases. MemSQL is a NewSQL database. There are also a wide, and growing, range of NoSQL databases. (We have our own take on NoSQL.) 

Any truly distributed database, whether NewSQL or NoSQL, depends on the cooperation of many separate nodes to function. For a distributed database, your performance is only as good as your slowest node – and, the more time each and every node is up and running, the faster and more reliable the whole database is. 

Since performance issues can often be tied directly to configuration – whether it be the operating system, network, or disk – it is important to have a foolproof way to check that your entire system is set up properly, down to the last node, so you can unleash the full power of the database. 

Configuring MemSQL

What does this mean for the MemSQL database? Optimal configuration will lead to the best possible performance, with the fewest possible problems. This translates to obtaining more value out of your data faster, and spending less time on tuning and troubleshooting.

We will review some examples below of configuration recommendations that can affect the database. Our system requirements documentation contains a full list of these items.

At a high level, it is essential to confirm that your machines have enough resources to operate the database, and that your operating system is configured properly. Here are three examples:

  • From a hardware perspective, we require a minimum of 4 cores, and 8 GB of RAM, per server.
  • Some operating system configuration recommendations include checks for settings like ‘Transparent Huge Pages,’; if this setting is not disabled, you may experience inconsistent query performance. 
  • Configuring Non-Uniform Memory Access (NUMA), on your machines that can benefit from it, will improve your performance significantly given your workload.

MemSQL recommends more than a dozen specific system configuration settings be checked and, where needed, changed, before you install the MemSQL database. It’s tedious to have to check/change each of them by hand, across every host in your cluster – and any tedious manual effort opens the door for potential errors. 

To avoid the manual effort that would otherwise be needed, use MemSQL-Report to do this work. MemSQL-Report summarizes all the information in one place, through an easy-to-use interface.

Pre-Installation Validation in MemSQL Tools

The MemSQL Report module collects a report on your cluster that covers a series of checks around the MemSQL cluster, databases within it, and the system hosting it. It also outputs a set of pass/fail checks on settings, based on MemSQL-recommended best practices. 

In the previous versions, the Report module expected that you had an existing MemSQL cluster when using it. Recently, we released a version of the MemSQL Report that adds the ability to run pre-environment checks, which only reports on components that are applicable to host machines without the MemSQL software installed on them.

This feature allows you to confirm the validity of the environment before installing the database and loading data. Incorporating this pre-check functionality in MemSQL Tools means you have a clear-cut path to identify any problems, before you proceed with the installation process. 

How Does It Work?

In the first step of MemSQL software installation, you download MemSQL Tools to manage the software. (Later in the process, Tools will deploy the database for you as well.) After you register the machines that you plan to install MemSQL on with Tools, don’t proceed immediately with MemSQL installation as the next step. Instead, run the following command to check your environment first:

memsql-report collect --validate-env 

 

This collects a report with pre-installation environment checks, without installing anything. After the report has been collected, you can run: 

memsql-report check --validate-env --report-path </path/to/report>

 

This outputs a list of all pre-environment checks in a pass/fail/warn manner, and alerts you to any potential configuration changes that you need to make before proceeding with the installation. 

Below is a sample output of this check. See below for the actions we recommend you take, if you get this report in your own environment. 

$ memsql-report check --validate-env --report-path report-2020-05-05T000204.tar.gz

✘ minFreeKbytes ................................. [FAIL]

FAIL vm.min_free_kbytes = 67584 too low on 172.31.68.57

NOTE https://docs.memsql.com/memsql-report-redir/configure-linux-vm-settings

✓ validateSsd ................................... [PASS]

✘ partitionsConsistency ......................... [WARN]

WARN Some partitions start sector on nvme0n1 are inconsistent (should be a multiple of 4096): [nvme0n1p1]

✓ diskUsage ..................................... [PASS]

✓ chronydDisabled ............................... [PASS]

✓ cpuHyperThreading ............................. [PASS]

✓ cpuModel ...................................... [PASS]

NOTE AMD EPYC 7571 on all

✓ orchestratorProcesses ......................... [PASS]

✓ cpuFeatures ................................... [PASS]

✓ vmOvercommit .................................. [PASS]

✓ defunctProcesses .............................. [PASS]

✓ kernelVersions ................................ [PASS]

NOTE 4.18 on all

✓ cpuFreqPolicy ................................. [PASS]

✘ maxMapCount ................................... [FAIL]

FAIL vm.max_map_count = 65530 too low on 172.31.68.57

NOTE https://docs.memsql.com/memsql-report-redir/configure-linux-vm-settings

✓ collectionErrors .............................. [PASS]

✘ transparentHugepage ........................... [FAIL]

FAIL /sys/kernel/mm/transparent_hugepage/enabled is [always] on 172.31.68.57

FAIL /sys/kernel/mm/transparent_hugepage/defrag is [madvise] on 172.31.68.57

NOTE https://docs.memsql.com/memsql-report-redir/transparent-hugepage

Some checks failed: 11 PASS, 2 WARN, 3 FAIL

 

Seeing this report, as a user, I would do the following: 

  • Increase the vm setting, vm.max_map_count, to the specified value, which will decrease the risk of memory errors.  
  • Check on the consistency of disk partitions on this host, so that performance of disk operations across the cluster falls within a similar range.
  • Disable Transparent Huge Pages to ensure that the system has consistent query performance times.

For more information on these commands, please see the documentation on memsql-report collect and memsql-report check.

Conclusion and What’s Next

The ability to check your system, prior to installation, against a vetted set of MemSQL best practices ensures that your database is production-ready to serve your critical applications.

Stay tuned for additional functionality around pre-install checks that we will provide in the future. This includes the ability for the tool to run performance benchmarks on your hardware. Also, we plan to incorporate the validation check directly in the installation process, so you won’t have to run it separately anymore.

If you have a MemSQL cluster that’s managed by MemSQL tools, you can try this today! Check and see if your servers are configured appropriately. 

If you are not yet using MemSQL, you can try MemSQL for free or contact MemSQL

MemSQL Helios eclipse
Introducing
MemSQL Helios
The World’s Fastest Cloud Database