One of the main themes at DockerCon 2017 was the challenge of migrating legacy applications to containers. At MemSQL, we’re early adopters. We are already into our third year of running Docker at scale in production for our distributed software testing regime, where the performance, isolation, and cost benefits of containers are very attractive.
Before I take you through our journey to containers, let me start by outlining some of the general challenges of testing a distributed database like MemSQL. Our solutions are built for real-time data warehousing. Databases are really difficult to test — especially when they are designed to be distributed, real-time, and scalable. At MemSQL, we have millions of unique queries to test, highly variable runtimes, and some tests take hours of 100 percent CPU usage. We have over 10,000 unique tests, not to mention the number of test transformations, which may multiply that number by one hundred again.
Our tests also require gigabytes of RAM and multiple cores. That’s the kind of scale you have to think about for testing a platform like MemSQL. We also ran into some interesting new testing challenges as our product can take advantage of Intel’s unique AVX technology and vectorization — to run orders of magnitude more SQL queries and speed up the sequences per cycle. These modern architectures bring awesome advantages, but they can also add to testing scenarios. We started with off-the-shelf, but once you see all the things we’re doing, you can’t just throw it onto a common test platform. Commercial testing solutions are not designed for distributed applications. We paid our dues with a lot of experimentation and DIY.
We started our first test cluster about five years ago, when I first arrived at MemSQL. We named it Psyduck after the Pokémon character. Like testing, Psyduck always has a headache. For me, it was a really fun effort to be a part of the test program because it’s key to how we continue to evolve MemSQL while maintaining its reliability.
Initially we had a mixture of home grown, bare metal boxes in the office. There were 25 boxes that were just basically Dell desktop machines, manually managed. Additionally, we had a bare-bones Amazon EC2 presence — for bursty job requirements. That was our initial scaling strategy, manually managed VMs to take on additional load.
From there we looked at operationalizing the whole stack. First we invested in operationalizing VMs on bare metal. We took the 25 machines, cleared them, and built an OpenStack cluster. We scaled that up to about 60 machines on-premises. This allowed us to eliminate EC2, which saved us a huge amount of money monthly. But as we scaled that cluster we experienced a lot of pain and complexity on OpenStack. So we took a portion of the cluster and ran Eucalyptus instead. That ended up being interesting, but not very mature compared to OpenStack, and we were a little burned out on VMs at that point with the infrastructure.
We learned about Docker about three and a half years ago when Docker was still called DotCloud. We tested it out and prototyped what Psyduck could look like with containers. Containers matched our use case really well — software testing is basically a bunch of short lived, ephemeral jobs, and you need a way to run hundreds of thousands of tests per day, in an isolated environment. Docker gave us the ability to do that and spin up on the order of seconds (rather than the overhead of VMs in minutes). Basically we saw that Docker would give us a way to scale Psyduck in an operationally friendly way.
From there, we took the plunge and over a couple of weeks we rebuilt Psyduck from the ground up using Docker. And we’ve been running on that base architecture ever since. For the last three years, we’ve been running with Docker and we built a home grown scheduling system, because at that time Kubernetes and Apache Mesos didn’t exist.
We also wrote our own key value storage system that we call Pokedex — think bare bones S3, bare bones HDFS. We actually took the on-premises version of the Docker registry, and wrote an adapter for Pokedex to provide the parallelism we required during image distribution. We have over 150 test machines, physical Dell servers each running many containers. We have to deliver 5 GB of data to every machine per test run, so there’s tons of data to send around all at once. So Pokedex runs on a number of machines and takes advantage of leading performance optimizations for delivering files at scale. Pokedex backing the registry allowed us to deliver Docker images in the order of minutes, whereas with the old VM architecture we had to build out an arcane architecture based on torrets and other crazy technologies to deliver large VM image files.
We’re also running appliances from a startup called Diamanti to help run the Psyduck control plane. Today, Diamanti makes use of Kubernetes, and over time we plan to expand the use of Kubernetes across our entire cluster. We expect our use of Kubernetes to be ideal for orchestrating our container environment compared to our initial homegrown scheduling.
What We Learned
We’re very happy with the outcome of this journey. The container abstraction is solid for performance, isolation, ephemerality — everything that matters in software unit testing. We don’t care about persistence: the test runners come up as containers, do a bunch of work, write out to a distributed file system, and disappear. This abstraction allows us to not worry about what happens to containers, only that they complete their job and then go away.
On the down side, it’s really the DIY nature of tooling today for a lot of container production scenarios. In the early days, we hit issues with isolation across networking and storage. Over time, those things have become better and we have learned how to deal with Docker to make sure we eliminate noisy neighbors on the machines.
A fundamental challenge that remains with containers is related to the Linux namespace abstraction — if you run a container as privileged, that container can do things on a machine that applies to every other container. We want to be able to give our engineers the ability to do what they want with our machines, but with containers we have to remove some of that functionality. With VMs they could do anything — detach or add network devices. With VMs the test is completely isolated and we have very strong controls. With Docker, it’s the opposite — you have to tell engineers what they can and can’t do. Linux namespaces are gaining better semantics over time, but it’s still a challenge.
It’s also tricky debugging and testing software running in containers. Due to PID mapping normal Linux tools like perf or GDB don’t know how to properly work with the containerized software. This results in complex or impossible debugging scenarios that can make engineers very frustrated.
In the end, Psyduck helped us alleviate most of the headaches around testing software at our scale. Running containers on bare metal, we can afford to put our own products through the most demanding testing regime in our industry. After all, we’re the real-time data warehouse for Akamai, Kellogg, Pinterest, Uber, and other leading enterprises. Our customers run some of the largest distributed systems in the world. They know they can rely on us to keep their own customers happy.