High Availability for In-Memory Databases

Architecting for Resiliency

Introduction

Critical business applications need performance and scalability to meet user demand, but also an architecture for high-availability (HA) and resiliency to ensure uptime.

While traditional approaches have often involved more expensive "scale-up" servers, newer models allow for resilient and available architectures with commodity hardware.

This paper will outline the basics of architecting highly available and resilient configurations with MemSQL, an in-memory, distributed database that is fully relational.

MemSQL Cluster Configuration Basics

MemSQL is built to deploy on commodity hardware in your data center or the cloud. This approach allows customers to take advantage of the lowest possible costs to add memory, CPU, and storage to their configurations.

MemSQL uses a two-tiered architecture to simplify operations, increase resiliency, and allow for online cluster expansion. Aggregator nodes handle application requests and leaf nodes store data in memory as well as on flash or disk.

This architecture, shown in Figure 1, is fully redundant with a shared-nothing design so that loss of a node or multiple nodes does not impact system uptime.

resiliency-1

Figure 1: MemSQL Cluster Architecture

Media Types within a MemSQL Cluster

Each node within a MemSQL cluster includes both memory in the form or RAM, as well as I/O based media such as flash or disk.

Choice of media types depends on workload requirements. For row-based tables in MemSQL, all of the data will reside in RAM with the transaction log and snapshots going to disk. For column-based tables, the data will reside on disk or flash with the most frequently accessed portions cached in RAM.

resiliency-2

Figure 2: Media types in a MemSQL cluster

Replication Across Data Centers

To protect against data center outages, MemSQL offers replication across datacenters. This replication is asynchronous and includes the option to have a different number of aggregators and leaves.

The database replicated to the second cluster is read-only. However, active-active configurations are possible by having a second database where the master is on the second cluster, and its replica is on the original cluster.

A basic replication configuration is shown in the Figure 3.

resiliency-3

Figure 3: Basic replication between a primary and secondary (read-only) cluster

Considerations for Shared Storage

MemSQL was designed for local storage, offering users the lowest cost options for memory, flash, and disk by avoiding expensive proprietary storage arrays. Coupled with a distributed systems architecture, use of commodity servers provides the least expensive and easiest to way to scale.

Shared storage options with MemSQL are possible, but require high availability and resiliency expertise. Improperly configured clusters with shared storage can compromise overall availability.

Shared storage does offer benefits for traditional, non-distributed applications. However, distributed applications like MemSQL do not require shared storage and therefore avoid the complexity and costs associated with large storage arrays. Further, distributed systems can provide a more resilient architecture than one based on a single, or redundant set of arrays.

Working with virtual machines and containers provide another twist to the configuration picture. Some companies have IT policies where only VMs are provisioned for business users, not physical servers. Since VMs and containers often rely on shared storage for their own needs, we will explore these configurations for MemSQL.

Deploying MemSQL in Virtual Machines

MemSQL can easily be deployed in a VM or container. However, best practices are to deploy a single MemSQL instance, or a very small number of instances per physical machine. It is also critical to adequately allocate memory on a per VM basis. Over allocating memory results in an improperly configured cluster.

Having multiple or too many MemSQL instances per physical machine often leads to inadequate redundancy and over provisioned memory resources.

resiliency-4

Figure 4: Best practices for MemSQL deployment are one VM instance per physical machine

Shared Storage Configurations

Even with the advantages of a distributed configuration, there may be reasons to deploy MemSQL with shared storage. First we will review the pros and cons of a few approaches, and then cover best practices deployments.

Single Shared Storage Deployment

The advantage of this approach is the use of existing or provisioned corporate shared storage. However, a single array becomes a single point of failure, as shown in Figure 5. Further storage costs within a shared array are typically 7-15x the cost of direct attached storage internal to a server.

resiliency-5

Figure 5: Configurations with a single SAN present a single point of failure

Split Shared Storage Deployment

Adding a second storage array to the equation helps, however array failure now impacts half of the cluster as shown in Figure 6.

resiliency-6

Figure 6: Split shared storage still leaves 50% of the cluster exposed in event of failure

Redundant Networked Storage Deployment

The final shared storage configuration avoids the potential of impacting half of the cluster with a networked configuration shown in Figure 7. However, this setup can be extremely costly and complex to maintain.

resiliency-7

Figure 7: Networked storage minimizes impact of array failure but remains complex and costly

Best Practice Deployments with Shared Storage

Even with the challenges presented by shared storage, there are best practice configurations for deployments.

Intra-Datacenter High Availability

The key to deploying a successful cluster with virtual machines is the establishment of availability zones. In this case, one instance of MemSQL is deployed per physical server with static CPU and RAM bindings to the MemSQL VM. Also intra and inter-VM availability zone TCP/IP performance should be equivalent.

MemSQL is then evenly distributed between the two VM high availability zones, configured with a redundancy level of two. Data, plancache, and tracelog are written to shared storage. This configuration is outlined in Figure 8.

resiliency-8

Figure 8: Best practices deployment for MemSQL with VMs and shared storage

Resiliency Example with One Storage Array

The best practices deployment keeps the overall cluster operational even with a storage failure. In this example, if storage availability zone 1 becomes unavailable, half of the MemSQL VMs will be down, performance will drop to half of the usual level, but the service remains fully operational as shown in Figure 9.

resiliency-9

Figure 9: With Storage Availability Zone 1 offline the MemSQL service remains operational

Resiliency Example with Dual Failure

In the unlikely event that both physical servers in a leaf pair become unavailable, the MemSQL cluster will become unavailable, as shown in Figure 10. Users then have two options. If some existing data loss is acceptable than users can manually rebalance the partitions, sacrificing data on the unavailable nodes.

More likely, data loss is not an option, and users can switch to the surviving replicated cluster to maintain availability.

resiliency-10

Figure 10: If a leaf pair becomes unavailable, failover to a replicated cluster keeps services online

Conclusion

In-memory databases allow enterprises to deliver real-time results on critical data. Companies can also maintain architectures for high availability and resiliency in bare metal or virtual infrastructure. By following the best practices outlined in this paper, customers can achieve best in class approaches for real-time applications.

Download MemSQL and MemSQL Ops and see why leading companies are managing their data with MemSQL.

To discuss your real-time analytics needs with a MemSQL expert, please contact us at 855-4-MEMSQL (463-6775) or info@memsql.com. MemSQL Headquarters 534 4th Street, San Francisco, CA 94107