How MemSQL Works – At a Glance

Floyd Smith

What makes MemSQL unique? Its ability to combine the scalability of NoSQL databases and the familiar structure, transaction capabilities, and SQL support of traditional relational databases. MemSQL does all this while offering high performance on ingest, fast query response, and high concurrency – support for many active users at the same time.

Because of its SQL support, MemSQL is “plug and play” with existing database systems and with all the people and tools that are part of the SQL ecosystem. The combination of capabilities that are offered in MemSQL are not found together anywhere else.

In this blog post we show the underlying features of MemSQL that allow it to offer all of these capabilities at the same time. After you read this blog post, you will be able to assess MemSQL as a solution for your own database-related issues.

Introduction to MemSQL

MemSQL describes its database software as the No-Limits Database™ because it’s able to scale massively. MemSQL is a distributed data store that allows for no limits in terms of the number of concurrent queries you have on the environment. MemSQL can handle transactions and analytics, simultaneously, all in one system.

Ultimately what you get with MemSQL is a distributed datastore with relational SQL that gives you a lot of the efficiencies that distributed systems provide, all running on commodity standard hardware, as with NoSQL databases. But with MemSQL, you also get the SQL structure that so many organizations depend on.

MemSQL is the no-limits database, with massive scalability and high concurrency

The architecture of MemSQL has many similarities to existing NoSQL databases, while also having a relational database structure and SQL support, among other important differences:

  • Scale-out architecture. True distributed processing across arbitrary numbers of servers, easily scaling CPU capacity, RAM, and disk storage.
  • The ability to run on commodity hardware and in the cloud. For MemSQL, any machine or cloud instance that can run Linux can run MemSQL.
  • High-speed ingest. For a typical NoSQL database, ingest is only in batch mode, whereas ingest with MemSQL can handle both batch and streaming data and in-line data transformations via MemSQL Pipelines.
  • Flexible data formats. MemSQL supports structured, relational data that fully supports SQL queries, as well as semi-structured data such as JSON, AVRO, text, and spatial data; NoSQL databases can also handle unstructured data.
  • Tiered storage. Most NoSQL databases are disk-first, with some support for in-memory operations. MemSQL includes both and allows you to flexibly manage memory vs. disk usage.

MemSQL uses Pipelines to streamline the old extract, transform, and load (ETL) process

MemSQL, Feature by Feature

MemSQL’s unique architecture allows it to offer a combination of capabilities that isn’t available elsewhere. Here’s a quick summary:

  • Tiered storage across memory and disk. Each uses a separate table type.
  • A query optimizer that works across table types.
  • A memory and CPU manager with the ability to segregate user groups.
  • Distributed ingest which is lock-free across reads and writes, for bulk and streaming ingest, using a skip list index.
  • A management console with a GUI (MemSQL Studio) and command-line tools.
  • Multi-cluster management.
  • Enterprise security with RBAC, encryption, auditing, and strict mode to isolate data from administrators.
  • A SQL API.

MemSQL's lock-free architecture supports scalability using a lock-free index for simultaneous reads and writes

Following are highlights of these capabilities.

Tiered Storage, Memory to Disk

MemSQL takes advantage of tiered storage, which means it can spill data to memory and/or to disk. So you can have the choice of taking advantage of MemSQL’s memory-optimized rowstore tables for very fast, efficient query processing or ingestion.

However, MemSQL also runs queries on disk, using columnstore tables. For both table types, everything is distributed – so MemSQL is highly elastic, and allows for scale-out processing of that data.

Query Optimizer across Rowstore and Columnstore Tables

MemSQL has a SQL query optimizer that runs on the two different table types, row-based and column-based. This gives you the ability to do transactional processing, analytic processing, or both at the same time, using the best table structure for each workload.

MemSQL is able to store incoming data flexibly in rowstore and columnstore tables

Workload Management for CPU and RAM

On top of the query optimizer, MemSQL has a memory and CPU management governor. Part of that is a workload manager. It allows you to identify how to segment processes or users so that ultimately your query processing can prevent outages.

If you’re going to run out of memory, for example, MemSQL can help ensure that that doesn’t happen. Or you can segment users so that you can have one class of user or application get a certain amount of resources. It’s a very sophisticated workload manager that lets you get the best out of the system.

Distributed Ingest, Bulk or Streaming, Lock-free

MemSQL offers a lock-free architecture. This is based on the skip list index, which does a very efficient job of doing transactions and updates without necessarily locking or blocking other reads and/or writes. That allows MemSQL to deliver bulk and/or streaming ingestion.

So transactional support on MemSQL is similar to any NoSQL system, in that both MemSQL and a NoSQL system can do continuous loading. The difference is that MemSQL is ACID compliant. MemSQL carries every transaction all the way to the disk, so there’s no risk of data loss. MemSQL has a classic logging mechanism, so we can do HA configurations and replication of transactions to other nodes or clusters to ensure availability.

Multi-Cluster Management

We have a full management console called MemSQL Studio. It’s a GUI-based environment. We also have command-line utilities that gives you functions that can deploy, manage, repartition, rebalance nodes, all sort of built into the system.

The Studio product and command-line tools make it very easy to identify exactly where bottlenecks are. You can do things like query planning analysis, where you can inspect where query bottlenecks might be and troubleshoot those. All through a built-in, web-based monitoring environment.

MemSQL Studio is a multi-cluster management console

Enterprise Security with RBAC, Encryption, Auditing, and Strict Mode

MemSQL is very stringent in its support for security. As a company, we have a number of financial services and public sector customers that require the highest levels of security. So MemSQL security includes role-based access control (RBAC), encryption, and support for auditing.

We have another component, called “strict mode”, which imposes a separation of concerns model. So your administrators cannot have visibility to data, but they can help administer the environment.

SQL API and Queries

Lastly, what you get with MemSQL is a SQL API. Other distributed databases, forthrightly called NoSQL databases, lack this essential component. (As well as several of the other features listed above.) The MemSQL database is relational SQL through and through.

MemSQL’s table structures are all in relational format. That is ultimately what gives you that easy to work with data set that a lot of data analysts and folks in the business community enjoy.

The MemSQL Ecosystem

This is a summary of the MemSQL ecosystem in terms of the type of ingestion mechanisms that MemSQL supports, notably including Kafka and Spark streaming.

MemSQL is fully interoperable with other databases, messaging systems, and BI tools

MemSQL supports an HDFS Pipeline connector. We have MemSQL Pipelines that allow you to essentially ingest data as data lands in the source system. Pipelines to Stored Procedures give Pipelines even more power. So the notion of continuous loading from a relational database, a Hadoop cluster, an Amazon S3 database or other sources is supported by MemSQL’s architecture.

In the diagram, you can see that we have the two different table types, rowstore and columnstore. Lastly, you can connect a number of off-the-shelf business intelligence (BI) tools, such as Tableau or Looker, to build your dashboards and reports.

MemSQL runs on any Linux-based system, whether that’s on-premises or AWS, Azure, or Google’s cloud service. You can run anywhere that there’s a Linux-based environment.

Conclusion

You can see a version of this blog post as a presentation within the recent webinar, Hadoop Acceleration Strategies for Broader Use and Simplicity, led by Mike Boyarski. To learn more, click here to view the webinar and download the slides.

memsql rainbow wave
Webinar
See What’s New in MemSQL 6.7:
The No-Limits Database