ANALYTICS

“MemSQL’s Columnstore Blows All of the Free and Open Source Solutions Out of the Water” — Actual User

Jacky Liang

A columnstore database takes all the values in a given column – the zip code column in a customer database, for instance – and stores all the zip code values in a single row, with the column number as the first entry. So the start of a columnstore database’s ZIP code record might look like this: 5, 94063, 20474, 38654… The “5” at the beginning means that the ZIP code data is stored in the fifth column in the rowstore database of customer names and addresses that the original data comes from.

Columnstore databases make it fast and easy to execute reporting and querying functions. For instance, you can easily count how many customers you have living in each US zip code – or combine your customer data with a zip code marketing database.

MemSQL combines rowstore and columnstore data tables in a single, scalable, powerful database that features native SQL support. (See our blog post comparing rowstore and columnstore.) And, in addition to its fast-growing, paid enterprise offering, MemSQL also has a highly capable free option.

You can use MemSQL for free, with community support from our busy message board. MemSQL is free up to the point where you reach four nodes, or four separate server instances, with up to 32GB of RAM each – 128GB of RAM total. This large, free capacity is particularly useful for columnstore tables, where 128GB of RAM is likely to be enough to support a terabyte or so of data on disk, with excellent performance.

We have several existing customers doing important work using MemSQL for free. And when you need more nodes, or paid support, simply contact MemSQL to move to an enterprise license.

Why Use Columnstore?

The columnstore is used primarily for analytical applications where the queries mainly involve aggregations over datasets that are too large to fit in memory. In these cases, the columnstore performs much better than the rowstore.

A column-oriented store, or “columnstore,” treats each column as a unit and stores segments of data for each column together in the same physical location. This enables two important capabilities. The first is to scan each column individually – in essence, being able to scan only the columns needed for the query, with good cache locality during the scan. These features of columnstore get you excellent performance and low resource utilization – an important factor in the cloud, particularly, where every additional operational step adds to your cloud services bill.

The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together. MemSQL compresses data up to about 90% in many cases, with very fast compression and decompression as needed. As with the data design of columnstore tables, compression delivers cache locality, excellent performance, low resource utilization, and cost savings.

In summary, you should use a columnstore database if you need great analytics performance. It also helps that MemSQL, as a scalable SQL database with built-in support for the MySQL wire protocol, natively supports popular analytic tools like Tableau, Looker, and Zoomdata.

MemSQL ingests data from multiple sources and supports BI tools, SQL queries, and more.
MemSQL combines fast ingest, in-memory rowstore tables in memory, disk-based columnstore tables, and inherent SQL support for maximum power and usability.

A big advantage with MemSQL is that you get both rowstore and columnstore tables in a single database, with built-in SQL support. This gives you a number of advantages:

  • If you need to have rowstore table data duplicated and, in many cases, augmented in one or more columnstore tables, you can do this in a single database.
  • You can run queries that join, or otherwise operate on, data spread across multiple rowstore and columnstore tables.
  • You can make “game-time” price-performance decisions between storing your data in super-fast, in-memory rowstore tables vs. large, disk-based columnstore tables, then modify your decision as your business needs change.
  • The training and experience you gain in using MemSQL for one use case extends automatically to many other use cases, whether rowstore or columnstore. For some of our customers, MemSQL is the last new database they’ll ever need – and ends up replacing one or more competing database options.

You can also read more about the difference between rowstore and columnstore in our documentation.

Some Current Columnstore Options

There are several columnstore options out in the market. Here are a few of the more popular ones that we see.

Note. Most of these options are not fully comparable to MemSQL because they don’t support both rowstore and columnstore, in-memory and disk-based tables, as MemSQL does. (See the list of benefits to this converged capability above.) However, you should consider a range of options before choosing any database provider, including MemSQL.

ClickHouse

ClickHouse is an open source columnstore database developed by Yandex specifically for online analytical processing (OLAP). ClickHouse allows for parallel processing of queries using multiple cores and very fast scanning of rows, while offering good data compression.

However, there are disadvantages to using ClickHouse. There is no real DELETE/UPDATE support and no support for transactions. ClickHouse also uses its own query protocol, which means limited SQL support. This also means your favorite SQL tools may not be supported if you choose to use ClickHouse. Also, if you are migrating from a SQL database, you will likely have to re-write all your queries which have joins – a common operation in SQL.

MariaDB Columnstore

MariaDB is an open source fork of MySQL. This fork was done by Michael “Monty” Widenius, the author of MySQL, after Oracle purchased Sun Microsystems.

MariaDB supports an open and vibrant community that has frequent updates and excellent community support. Additionally, MariaDB maintains high compatibility with MySQL, so it can be used as a drop-in replacement which supports library binary parity and exact matching with MySQL APIs. MariaDB also offers a columnstore engine for analytical use cases.

However, since MariaDB only supports storing data on disk, if query speed and latency is a priority, then you may not be too happy with the performance. Additionally, MariaDB’s columnstore product is also still quite new, so there is likely to still have work to be done.

Pivotal Greenplum

Greenplum is a columnar data warehouse based on PostgreSQL. Greenplum uses massively parallel processing (MPP) techniques, with each database cluster containing different nodes, such as the master node and the segment node. This allows for parallel processing of queries and storage of data. Greenplum is also fully SQL-compliant and ACID-compliant. Finally, unlike most columnstore databases – but like MemSQL – Greenplum also supports both row and columnstore data storage.

However, customers sometimes complain about the performance and usability of Greenplum. Many customers found the product to be difficult to tune, as Greenplum tends to use all the available system resources for every single query, which can lead to performance degradation when multiple queries are executed at the same time. Also, under high write load conditions, Greenplum would cause something called a Linux journaling error. Errors of this type may require rebuilding the entire database, which might take many hours to complete.

SAP HANA

HANA is an entirely in-memory columnstore database developed by SAP. A major strength with SAP HANA is that it’s built as a data platform — there are multiple “engines” that sit inside HANA columnstore. There are specialty engines built for calculations, spatial use cases, predictive algorithms, and more, allowing users to pick and choose the right engine for their specific use case without having to use materialized views.

However, common complaints among SAP HANA users are the specialized skills one may need to work with the product. Furthermore, since SAP HANA is entirely in-memory-only with no disk component, it can get fairly expensive just for the RAM to contain all your data. Finally, the licensing costs of SAP HANA can get fairly high as well.

Where MemSQL Shines

In November 2018, we launched MemSQL 6.7. With MemSQL 6.7, as well as later MemSQL releases, you can use MemSQL for free, within fairly robust limits. When using MemSQL for free, you can create clusters with up to four nodes, with no limit on the amount of data stored on disk. You also receive community support via online forums, rather than direct, paid support.

Since launching MemSQL 6.7, we have been listening to how people using our software for free use MemSQL. Consistently in these conversations, our users – including those that run production workloads on the free tier – have consistently praised our columnstore.

Purcado's analytics benefit from MemSQL's storage efficiency across a wide range of hardware.
Purcado helps people get the best deals, on the best shoes,
from the best retailers, quickly and easily.

What makes it so good?

  • You can use it for free up to four nodes with unlimited disk. But that’s not all — it also consistently outperforms other free, open source, and even enterprise-grade columnstore databases. This conclusion came from a number of people that have tested many rival databases – sometimes, even ten or more – before finally arriving at MemSQL.
  • MemSQL has built-in support for ANSI SQL, so the query language is very familiar. We also support the MySQL wire protocol, meaning we support a wide range of tools in the data ecosystem.
  • MemSQL offers incredible compression in disk-space columnstore, allowing you to store more data and save precious storage space at the same time. Real customers like Pandora are able to reliably achieve 85–90% on-disk compression for columnar data.
  • MemSQL’s fully distributed nature means you can simply add affordable commodity hardware to increase query performance, concurrency, and ingest speed, and reduce data size.
  • Finally, unique to MemSQL, the ability to combine rowstore and columnstore data in one query means you get the benefits of real-time and historical data unified in one query! This means simplicity for your data engineering stack, lower maintenance costs, and improved performance, as MemSQL can, in many cases, be the one database to rule them all.

Did we mention you can use all this for free?

What People Are Saying

We can tout the benefits of MemSQL all we want, but we think it’s even better to let people who are using MemSQL for free do the talking for us. These are testimonials we have received directly from software developers and data engineers, company founders, and others using MemSQL to run their applications, answer their queries, and drive their businesses forward.

Paul Moss, E-commerce Startup in the United Kingdom

“I use MemSQL primarily for its columnstore. Your columnstore blows all of the free / open source columnstores in the market currently out of the water — it’s just so fast. PostgreSQL and CitusDB are inferior to your product. It’s not even close, especially since I’m running MemSQL on a single CentOS workstation machine. Additionally, as a business owner, you want the simplest engineering stack possible.

MemSQL is one database to rule them all, replacing three to four different databases. It does it all well.”

Hajime Sano, Nikkei in Tokyo, Japan

“The performance of MemSQL free tier is just as good as the enterprise version, which means performance for each query is really fast, the fastest in the columnstore databases out there. That is the greatest thing. MemSQL also supports both rowstore and columnstore in one query. We’re now able to balance real-time query performance (in rowstore) with lower hardware cost (in columnstore). 24/7 operational data goes in in-memory, while archival data goes to disk.”

Nikkei Asian Review uses MemSQL to help with analytics for their multinational media presence.
Nikkei Asian Review, with scores of bureaus and more than 1000 journalists throughout
Asia, delivers both business-focused and general coverage across the region.

Software Developer in Publishing Company in Germany

“The incredible compression and speed of the columnstore engine really is something, querying gigabytes of data in seconds was amazing to see. Also the possibility of combining rowstore and columnstore in one query is a very nice feature.”

Elad Levy, Entrepreneur in the Mobile Games Industry

“MemSQL in particular has columnstore, which is free, and it’s amazing. If you want to analyze data and get business insight, just go with MemSQL’s columnstore. You also get the ability to mix and match transactions (OLTP) and analytics (OLAP) in a single query, which saves us from deploying and querying another database. It’s a 2-in-1 solution.”

Peter Baylies, Purcado in Durham, NC

“I appreciate MemSQL’s speed even on modest, single-box hardware, as well as its storage efficiency on disk.”

Next Steps

Don’t take our word for it — you can find out yourselves why our customers are saying such positive things about our columnstore and choose to run their businesses on MemSQL, both for free and paid, with support. We have a tutorial on loading data into MemSQL and a webinar for building an analytics app using MemSQL’s columnstore. These resources show just how fast and easy it is to set up and use MemSQL.

We have a tutorial on loading data into MemSQL and a webinar for building an analytics app using MemSQL’s columnstore. These resources show just how fast and easy it is to set up and use MemSQL.

To sum up, when using MemSQL for free, you can:

  • Use up to 4 nodes, with no specific limit on disk storage
  • Get rich community support at forums.memsql.com
  • Deploy to production
  • Not face any time limits

Want 24/7 support and even more nodes? You can contact us to begin the conversation.

memsql ribbon
Live Webinar
See a Demo of MemSQL & Kubernetes