ANNOUNCEMENT

We Spent a Bunch of Money on AWS And All We Got Was a Bunch of Experience and Some Great Benchmark Results

John Sherwood

Editor’s note: Running these benchmarks and documenting the results is truly a team effort. In addition to John Sherwood, Eric Hanson and Nick Kline authored this blog post.

If you’re anything like us, you probably roll your eyes at “company benchmarks its own software, writes a post about it.” This of course raises a question: if we’re cynical industry veterans, numb from the constant deluge of benchmarketing … why are we writing this? Simple: we wanted to prove to ourselves that we’re the only modern, scalable database that can do a great job on the three well-known database benchmarks, TPC-C, TPC-H, and TPC-DS, which cover both OLTP and data warehousing. Now, we hope to get you to suspend your skepticism long enough for us to prove our capabilities to you.

MemSQL’s primary focus is on what we call operational analytics (you might have heard this referred to as HTAP, translytical, or HOAP), running analytical queries across large, constantly changing datasets with consistently high performance. This performance is provided through the use of scale-out, compilation of queries to machine code, vectorized query execution, and use of single instruction, multiple data (SIMD) instructions. Our operational analytics capability blurs OLTP and data warehouse (DW) functionality, and different use cases take advantage of different slices of our spectrum of capabilities.

To test some of the new features in our upcoming 7.0 release, we used multiple TPC benchmarks to push MemSQL beyond what our normal everyday stress tests can reach. Our results show that we can do both transaction processing and data warehousing well, and we scale well as workload size increases. No other scale-out database running on industry-standard hardware can do this.

Playing Fair

TPC benchmarks have a certification process, and we did our best to follow the specifications, but we did not work with TPC to certify the benchmarks, so these are informal results. But we hope it goes without saying that we didn’t cheat. In fact, MemSQL does not have any “benchmark specials” (features designed just to make the benchmark result better, but that nobody would ever use for a real application) built into the code.

The TL;DR

This is a long post because we wanted to write an in-depth overview of how we ran the benchmarks so you can see how we achieved all these results. But if you want just a quick summary, here’s how we did on the benchmarks.

  • TPC-C: MemSQL scaled performance nearly linearly over a scale factor of 50x.
  • TPC-H: MemSQL’s performance against other databases varied on this benchmark, but was faster than multiple modern scale-out database products that only support data warehousing.
  • TPC-DS: MemSQL’s performance ranged from similar, to as much as two times faster than other databases. Expressed as a geometric mean, as is often done for benchmarking results, our performance was excellent.

What we accomplished was solid performance across three distinct benchmarks, establishing MemSQL as a top choice database for operational analytics and cloud-native applications. Now let’s move on to the details of the benchmarks.

TPC-C: Scale-out OLTP

On the massively transactional side of the spectrum, we have the TPC-C benchmark. To quote the 1997 SIGMOD presentation announcing it, a benchmark is a “distillation of the essential attributes of a workload,” and TPC-C distills the absolute hell out of a sharded transactional workflow. As the scale (defined by the number of warehouse facilities being emulated) increases, the potential parallelism increases as well. From our perspective, this is ideal for discovering any bottlenecks in our own code.

Our TPC-C database design used single tables for the large data sets, allowing us to use an existing driver targeting MySQL. Unlike some other official results published by major database vendors, we did not artificially split the data into multiple separate tables to reduce locking and contention. The way we ran the benchmark is much simpler, and shows how our scale-out architecture can make application development easier.

While experimenting with smaller datasets, we quickly discovered that driving significant load required us to nearly match the CPU of the aggregators with the leaves in the cluster. This configuration, strong all over (“very Schwarzenegger”, as one of our engineers put it) is quite unusual; it’s rare that customers require such a ratio of aggregators to leaves.

In part, our internal benchmarking service colocates drivers on aggregators, which required us to size up boxes to have the extra CPU. Additionally, under the pure OLTP TPC-C workload, aggregators are primarily coordinating transaction state and are effectively concurrency bound.

As we would recommend for a real workload of this nature, we had redundancy and synchronous replication enabled. This means that there are two copies of every data partition on two different nodes for high availability (HA), and transaction commits are not acknowledged to the client until both the primary and secondary replicas are updated. This of course requires twice the memory compared to running without HA, and adds to transaction overhead, since row updates must be applied on the replica node before the commit. This cost is to be expected if HA with fast failover is required. If you want HA, you need to pay.

After experimenting with smaller datasets, we drove just over 800,000 transactions per minute (tpmC) against 10,000 warehouses, using a cluster with 18 leaves and 18 aggregators, all r3.8xlarge instances. This led us to scale up, moving onto TPC-C at the 50,000 scale. This required what we would term a formidable cluster; storing a single replica of the data (including indexes) required more than 9 TB of RAM.

For HA, we stored two copies of the data, which, after some fairly straightforward math, took us to almost 20 TB. We decided to use r5.metal instances, using 2x the leaf instances, for a total of 6x the cores compared to the 10,000 warehouse cluster. Notably, the only change we made in running the benchmark itself was in tuning the partition count and hash index buckets; from the perspective of the open source benchmark driver we were using, originally written for MySQL, everything just worked.

The last notable thing about our 50,000 warehouse run was that we ran out of hardware from AWS while trying to reach the limit of the cluster; as we had taken 36 hosts for leaves and 21 for aggregators, we suppose that’s pretty reasonable.

With 1,728 physical cores on the leaf nodes for the megacluster (to use the scientific term), it’s hard to argue with calling this “throwing hardware at the problem.” However, as discussed above, we were able to easily meet the same per-core performance as at smaller scales. We feel that our overall results are a strong validation of the performance improvements to our new intra-cluster replication implementation in MemSQL 7.0.

TPC-C results show MemSQL scalability.

 

Figure 1 – TPC-C results for MemSQL.

Scale vCPUs
1,000 48
10,000 576
50,000 3,456

Table 1 – Leaf Cores, normalized to vCPUs. This table only includes leaf vCPUs for MemSQL, as we also had to use aggregator CPU to run the benchmark driver instances. Full details are included in Appendix A.

In Table 1, for MemSQL, we show both the number of leaf cores and the total number of cores, including aggregator cores. The way our benchmark kit is designed, it was easiest for us to run the benchmark with this configuration. In addition, our benchmark setup ran the application threads driving the transaction work on the aggregator machines. Normally that is not accounted for in the benchmark results.

The aggregator nodes were not fully utilized, and significantly over-provisioned to drive this test, though we were not able to determine the minimum amount of equipment needed to obtain performance at the level shown in Figure 1. Despite the caveats, these results show the dramatic benefits of our distributed architecture, in-memory skip list and hash storage structures, and compilation of queries to machine code.

While the above data is a beautiful (in our eyes) illustration of our ability to consume an OLTP firehose, these results are only possible by violating the rate limiting requirement found in the TPC-C specification. TPC-C is intended to emulate a limited number of physical terminals being interacted with by humans, and so more faithful drivers are limited to only pushing a handful of tpmC per warehouse. In a great example of why the TPC and the concept of audited results exist, we explicitly state here that these results are not comparable to others on that basis.

Per-Core Throughput for OLTP

It’s comforting to know that you’re getting your money’s worth as you scale a cluster and workload to larger sizes. Figure 2 illustrates the ability to maintain throughput per core for a transaction processing workload as data and cluster size increase. This translates to roughly linear scalability in throughput as the size of the transaction processing workload increases. Yet you can still run analytical queries across all the data, unlike the common approach of manually sharding large data sets across multiple databases instances with a single-node DBMS, such as Postgres, MySQL, or SQL Server. The nature of MemSQL allows for running analytic-style queries on this data in the same instance that is also handling transaction processing.

 

Figure 2 — Per vCPU transaction throughput (includes aggregator vCPUs). When increasing the dataset size and the hardware to match, MemSQL TPC-C throughput per vCPU held within a 30% band. We had plenty of headroom on the leaves at the 50,000 scale, and could have driven more throughput, but AWS ran out of r5.metal hosts for us to use as benchmark drivers, which was a pretty cool consolation prize in our opinion.

TPC-H and TPC-DS: DW Scalability and Then Some

On the opposite end of the spectrum, we have TPC-H (H) and TPC-DS (DS), categorized by the TPC as “decision support” benchmarks. H and DS use similar datasets, and DS is effectively the next-generation version of H. While H generates fairly straightforward queries and tends to be shard-friendly, DS thrills in its use of advanced SQL features and functions and exhilarates in its lopsided filters. Running DS is notoriously, purposefully difficult, and MemSQL can run all 99 DS queries, while many decision support-oriented databases cannot.

TPC-DS

At scale factor 10,000, the largest TPC-DS table contains just shy of 29 billion rows, with some 24 billion others spread out across the rest of the tables. TPC-DS then runs a set of 99 queries using a  formidable array of operators and features against this large data set. Below, we show all published official TPC-DS benchmark results to date  [AL19, AL19b, Cis19] , comparing the total query runtimes with our unofficial MemSQL run. In addition, we sought out one more informal result on a well-known cloud data warehouse product, which we refer to as “Cloud DW Product.” Although MemSQL’s core strength is operational analytics, these comparisons show that we also are also competitive with data warehouse and big data products on a very challenging pure data warehouse workload.


Figure 3 — comparative TPC-DS runtimes at 10TB scale factor — lower is better.
 For TPC-DS, we performed a ‘power run,’ where the sequential runtime of each query is summed. Please see the differences in benchmark versions and hardware used, described below.

Product Physical Cores (not VCPUs) TPC-DS Version Is Certified Benchmark Result Power Run Time (s) Geometric Mean (s)
MemSQL 640 1.4.0 no 6,495 6.03
Cloud DW Product 32 nodes, unknown size 1.4.0 no 4,481 11.03
Alibaba AnalyticsDB 512 2.10.1 yes 3,935 9.81
Alibaba Hadoop 520 2.10.1 yes 16,207 81.96
Transwarp Data Hub 1024 2.7.0 yes 21,615 72.73

Table 2 – Hardware used for TPC-DS runs and results, Power Run times, and Geometric Mean times. We normalized to physical cores rather than vCPUs to match TPC usage.

The values displayed in Figure 3 and Table 2 don’t imply a strict ranking of the systems because of the hardware and version differences. Nevertheless, we believe that the overall results illustrate important relative strengths of the different systems, because the hardware for all the systems is of the same order of magnitude in scale, and the queries are largely equivalent between versions.

We’ve included the geometric mean of the results in Table 2. You can calculate the geometric mean as the nth root of the product of n numbers. (For instance, the 9th root of the product of 9 numbers.) The interesting aspect of a geometric mean, when used for benchmark results, is that it de-emphasizes outliers. As a result, many consider it the most useful single value when reporting results of this type.

For the result shown in Figure 3, where we spent about 6,500 seconds to run the queries, we used a cluster consisting of 19 i3.16xlarge leaves and a single i3.16xlarge aggregator instance. With over 600 leaf CPUs, this was certainly a sizable (and, we have been informed by our finance team, expensive) cluster, but still similar in scale to other clusters that successfully completed the benchmark.

TPC-H


Figure 4 — TPC-H total runtimes comparing MemSQL and SQL Server — smaller is better.
We performed the same ‘power run’ with TPC-H as for DS, where the result is the sum of the run times of the sequentially executed queries that constitute the benchmark. This figure compares MemSQL, SQL Server 2017 at scale factor 10,000, and SQL Server 2016 at scale factor 30,000 (10 TB and 30 TB).

Compared to TPC-DS, TPC-H is a much friendlier (and older) benchmark. We ran H at the 10 TB scale factor on a cluster of 8 i3.8xlarge leaf nodes, and then scaled to run it at 30 TB using 24 i3.8xlarge leaf nodes. The 30 TB dataset contains a table with 180 billion rows, compressed to about 11 TB on disk in our columnstore, and took about three and a half hours at an average of 2.4 GB/s to load from S3 using our Pipelines feature.

While researching the benchmark, we found that every run submitted to the TPC at these larger scales was for SQL Server, the best of which we’ve graphed next to our results above. As shown in Figure 4, at a lower scale, we are outperformed by a quad-socket host running SQL Server with 224 vCPUs versus our 256. But at a larger size, scale-out catches up to scale-up – and running on commodity hardware versus specially-crafted quad socket monsters is a virtue of its own, as far as data center operators are concerned. Details of the SQL Server results are given in benchmark results published by Microsoft [MS16, MS17].

Figure 5 – Data volume processed per second per core for TPC-H. This chart shows a normalized throughput measure (DB_size_MB/cores/sum_of_query_runtimes_in_seconds) for each of TPC-H 10000 and TPC-H 30000. This is to illustrate how MemSQL scales for the benchmark as data size increases.

What we’re most looking for here is validation of our scale-out architecture. A fantasy, “perfectly scalable” database would process 3x the data with 3x the hardware in the same amount of time; as illustrated in Figure 5, we took about 30% longer than that ideal. Given that queries often require a final aggregation step, some falloff is expected. Moreover, more hardware adds substantial marginal return and doesn’t hit a ceiling in these tests — per core throughput drops by 30% as total work done goes up by a factor of around 300%.


Figure 6 — comparative TPC-H results at 30 TB scale factor.

We again scoured the web for TPC-H results and found several for well-known cloud data warehouse products, including Amazon Redshift, Microsoft Azure SQL Data Warehouse, and Snowflake Data Warehouse in reports from GigaOm [GO18, GO19], as well as certified single-box results on the TPC web site. As shown in Figure 6, MemSQL provides competitive results compared to these products.

Cost and Value

At this point you might think the results are pretty impressive, but you’re probably also thinking there is some special-purpose hardware helping make these benchmarks run smoothly. Well, there isn’t. Just as we didn’t add any code tricks to make the benchmark run better, we didn’t use anything other than industry-standard hardware. We really wanted to make these benchmarks replicable by anyone who wanted to test them, and you can. We explain how to do that below.

The enterprise edition of MemSQL will cost you dramatically less than legacy vendors such as Oracle, plus we have a free version running on up to a 4 node cluster that supports all enterprise features. Beyond just price, we believe MemSQL offers new value to customers by enabling them to build operational analytics applications, mixing OLTP and analytics operations, in a way that has not been feasible before. This enables real-time dashboards and other analytics on just-born data, driving new business value.

By showing results for TPC-C, TPC-H, and TPC-DS running on one scale-out-capable SQL DBMS, we believe we demonstrate that MemSQL delivers new value. In addition, given the breadth MemSQL covers, once a group of application developers in an organization are trained to use it, they can apply their skills to a broad range of applications, giving flexibility to them and their managers. This of course is also valuable to the business.

Conclusion

We’ve shown that MemSQL can deliver strong results for the TPC-H and TPC-DS analytical benchmarks and the TPC-C transaction processing benchmark. To our knowledge, we’re the only SQL DBMS supporting true shared-nothing scale-out that can do this. Our hardware needs are comparable to competitors when running these benchmarks at similar scale factors, and our software licensing costs are significantly less than those of the established database vendors. MemSQL skills can be used to build a variety of applications, including systems for transaction processing, operational analytics, operational data stores, and data warehousing, without scale limits. This will give you and your team the ability to solve multiple problems at once, in less time, with one powerful tool.

This post was edited to remove a comparison to another company’s published data after we discovered that the comparison was technically inaccurate.

Appendix A: More Detailed Results

TPC-C

The hardware configurations we used for TPC-C for 1,000, 10,000, and 50,000 warehouse scale factors are given in the following table. The node types given are for AWS.

Warehouses Leaf Nodes Sum Leaf vCPUs Sum Leaf RAM Aggregator Nodes tpmC
1,000 3 x r3.4xlarge 48 366 GB 3 x r3.4xlarge 94,900
10,000 18 x r3.8xlarge 1,152 4.29 TB 18 x r3.8xlarge 800,400
50,000 36 x r5.metal 3,456 27 TB 21 x r5.metal 5,300,000

We used the open-source MySQL driver from Percona for TPC-C, with slight modifications to the DDL. Specifically, we made the item table into a reference table, the history table into a columnstore table, changed indexes where appropriate to be hashes instead of skip lists, and sized the bucket count appropriately for the size of the dataset. Our cluster was configured to use two replicas for data, replicated synchronously across paired hosts, with asynchronous writes to disk.

Aggregator nodes ran the benchmark driver alongside the MemSQL aggregator instances, where the driver consumed roughly half of the CPU used. Although we used the same instance type for leaves and aggregators out of convenience, in the TPC-C workload aggregators also required very little RAM, and it should be easily possible to save on costs by running on lower-end hosts.

We can’t speak highly enough of the driver from Percona and how useful it was in enabling us to run the benchmark. We generated our datasets using the utility from the same repo, albeit modified to generate pipe-separated value files. We then loaded these files in parallel with MemSQL Pipelines, instead of doing direct database inserts.

TPC-H

The hardware configurations we used on AWS for TPC-H are given in the following table.

Scale Factor Leaf Nodes Sum Leaf Phys. Cores Sum Leaf RAM Aggregator Nodes Power Run (s)
10 TB 8 x i3.8xlarge 128 1.9 TB 1 x m4.10xlarge 1,789
30 TB 24 x i3.8xlarge 384 5.72 TB 1 x m4.10xlarge 2,371

We generated our datasets using the open-source dataset generator available from tpc.org. In calculating the power run, we performed a warmup query, then averaged the result of 1-5 executions. The total number of runs was sometimes limited due to the total runtime of all the queries and the time available to finish the runs.

TPC-DS

The hardware configuration we used on AWS for TPC-DS is shown in the table below.

Scale Factor Leaf Nodes Sum Leaf Phys. Cores Sum Leaf RAM Aggregator Nodes Power Run (s)
10 TB 19 x i3.16xlarge 608 9.05 TB 1 x i3.16xlarge 6,495

We generated our datasets using the dataset generator available from tpc.org. In calculating the power run, we performed a warmup query, then averaged the result of 1-5 executions, depending on the runtime of query execution, in the same fashion as for our TPC-H runs.

Appendix B: How to Play Along at Home

If you’re interested in replicating our results, or experimenting with MemSQL and these benchmarks, we’ve included instructions on how we went about running C, H, and DS here.

TPC-C

To run the TPC-C benchmark, we relied heavily on Percona’s excellent work, which we’ve forked to github.com/memsql/tpcc-mysql. This driver allows both generating the dataset and driving load. As MemSQL is wire compatible with MySQL, we were able to use the driver binary in an effectively unaltered format; in fact, the only changes we made to the repository were as follows:

  • Fixed a units bug that caused queries to be judged as failing if they took 5 milliseconds instead of 5 seconds.
  • Changed the dataset generator to create pipe-separated value files rather than inserting directly into the database.
  • Altered the schema to have shard keys and to take advantage of MemSQL features such as reference tables and hash indices.

If you’d like to use this driver to verify our results, the README covers how to both generate data and run the benchmark.

Cluster Setup

For the purposes of the TPC-C benchmark, there are several configuration settings that will drastically impact your results. To match what we used, do the following:

  • MemSQL 7.0: While we did recently make our latest and greatest 6.x release available, one of the major improvements in MemSQL 7.0 (download here) is an entirely new system for handling intra-cluster replication more efficiently. While 6.x releases support replication, performance is dramatically improved in 7.0.
  • Enable synchronous replication.
  • Enable asynchronous durability.
  • Once MemSQL has the correct configuration, you will need to create a database, per the Percona README.

Benchmark Execution

While running the benchmark is exceedingly simple, to fully stress a large MemSQL cluster you will need to spread the load across multiple aggregators. To get relevant results, the driver instances need to run in parallel, and only data from when all drivers are running is a valid representation of the cluster’s performance. We wrote a harness that used the database to coordinate across hosts; however, as it hooks in pretty deeply to our internal benchmarking system, we’re unable to meaningfully open source this extension.

TPC-H & DS

While the TPC-C specification is highly descriptive and gives a wide latitude in how results are achieved, TPC-H and DS are explicitly about executing SQL queries. As such, we were able to leverage the dataset and query generation tools available for free download from tpc.org: [H], [DS].

At a high level, running both benchmarks on MemSQL consists of the following steps:

  1. Data generation
  2. Create the table schema using our script (so that you use properly sized MemSQL data types and our shard key definitions and indexes) using create-tables.sql
  3. Load the data, using MemSQL Pipelines or a tool of your choice
  4. After loading the data, create statistics and optimize the tables; the commands are found in after-load.sql
  5. Run the queries by executing queries.sql

Sizing the MemSQL database cluster

We ran the benchmarks on AWS. For the DS benchmark at the 10 TB size we used a 20 node cluster with 1 MA, and 19 leaves, in total 20 nodes; for the H benchmark at 30 TB we used a cluster with an m4.10xlarge aggregator and 24 i3.8xlarge leaves.

Generating the data

Use the standard data generator for each benchmark found at http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp. To accomplish this, look under H & DS and download the tools for that benchmark. Register, download the latest version, compile the C-based generator. Data generation can take a long time, especially at a larger scale like the 10 & 30TB sizes we used. The data size of the csvs will unexpectedly be quite close to the promised value, and we highly recommend asynchronously compressing and uploading to cloud storage; we stored the generated data files in AWS s3.

Creating the schema

You can find the table schema and queries at the MemSQL github repo at memsql/benchmarks-tpc, looking under the ‘h’ or ‘ds’ subdirectories as appropriate. You can clone or copy them to your local machine by using the command

git clone git@github.com:memsql/benchmarks-tpc.git

In the benchmarks-tpc/tpcds and benchmarks-tpc/tpch  directories you can find the files we used to create and run the benchmark, respectively H and DS versions of create-tables.sql, after-load.sql and queries.sql. Two common ways to submit queries to MemSQL are to use the command line MySQL client or the MemSQL tools. The MemSQL Studio tools, including the SQL editor, are described at https://docs.memsql.com/memsql-studio/latest/memsql-studio-overview/.

Loading the data

Depending on the system your test cluster is running on, there are various alternatives for loading data. The MemSQL Pipelines feature loads data in parallel; see https://docs.memsql.com/memsql-pipelines/v6.7/pipelines-overview/. We used MemSQL Pipelines to load data from S3 for our tests. An example pipeline command follows.

CREATE PIPELINE catalog_sales AS LOAD DATA S3
"<memsqlpath>/sf_10000/catalog_sales."
CONFIG '{"region":"us-east-1","disable_gunzip":false}'
CREDENTIALS '{ "aws_access_key_id": "<key>",
"aws_secret_access_key": "<secret>"}'
INTO TABLE `catalog_sales`
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '|\n';

After loading

The background flusher in MemSQL automatically and asynchronously reorganizes columnstore segment files (the internal files used to store columnstore tables) for compression and to create a total ordering across all segment files in a table. To make query execution faster and avoid asynchronous processes during query execution, we manually ran the command “optimize table”, as in OPTIMIZE TABLE `call_center`; see https://docs.memsql.com/sql-reference/v6.7/optimize-table/

MemSQL has manually created histogram statistics. Manual statistics can be created by running “analyze table” on each table, such as: ANALYZE TABLE `call_center` COLUMNS ALL ENABLE

We did this before running the queries.  These two commands are provided in after-load.sql.  

Running the queries

We ran the queries multiple times and use the average run time, not counting the initial run where the query was compiled. We used columnstore tables for this benchmark, which are stored on disk, not in memory. The difference between the warm-up run and subsequent runs is just the time to compile a query, which was generally less than a second, but might be as long as a couple of seconds for very large and complex queries. The average run time varies a little on multiple runs of a query in a shared system like AWS, but we never saw that much variation.

Many TPC benchmarks, including TPC-DS, have templatized queries that replace “placeholders” in an incomplete query text with specific values, such as numeric or string constants. We didn’t find much variation with different parameters, so we used a fixed parameterization for the ease of tracking performance across time. We include the query text with the parameters we used when running the queries in queries.sql.

Appendix C: TPC-H Raw Results

As described above, when running TPC-H and TPC-DS, there was an initial “cold” or warm-up run where MemSQL compiles (optimize and code-gen) the queries in addition to running them. (See our documentation for the compilation pipeline.)The warm run reuses that same query plan, but re-executes the entire query re-reading all data. The cold run time is a single run, while the final result is an average of warm runs as reported in Appendix A.

Generally the difference between final and cold times comes down to the compilation time as described above. However, as some results show, there can be other factors coming from the use of shared infrastructure.

Query 10T (final) 10T (cold) 30T (final) 30T (cold)
1 18.916 19.815 24.238 26.117
2 17.073 18.38 94.10 101.062
3 76.483 76.655 97.520 103.450
4 28.236 29.84 34.818 36.77
5 48.442 49.520 57.897 61.024
6 9.852 10.42 12.712 13.536
7 45.125 46.410 85.488 84.513
8 27.961 28.751 65.812 66.735
9 154.404 159.037 266.970 266.544
10 62.519 64.069 73.248 75.532
11 30.524 32.183 82.781 83.560
12 16.454 18.083 20.369 22.320
13 183.827 183.469 201.577 193.112
14 106.647 107.013 125.455 127.872
15 44.927 46.704 67.685 70.124
16 35.947 37.478 39.534 40.943
17 15.431 16.307 34.820 34.690
18 428.70 425.285 499.514 501.920
19 27.400 31.504 29.340 31.137
20 268.29 271.174 221.616 230.66
21 83.632 84.015 118.185 118.438
22 58.341 61.310 120.527 122.270

Appendix D: TPC-DS Raw Results

Refer to Appendix C for information on how to interpret these results.

Query 10T (final) 10T (cold)
1 7.933 9.232
2 16.367 17.324
3 0.774 1.633
4 105.199 106.541
5 23.133 27.435
6 149.987 150.346
7 2.033 3.293
8 2.624 3.558
9 9.375 17.654
10 8.231 11.261
11 46.006 45.239
12 0.836 3.395
13 2.327 4.435
14 321.497 329.853
15 7.681 5.129
16 60.115 58.161
17 2.421 6.716
18 10.289 13.983
19 2.059 4.071
20 0.955 3.061
21 1.292 2.353
22 1.509 3.311
23 590.080 598.040
24 1435.269 1530.016
25 2.390 7.031
26 1.510 2.930
27 2.245 4.688
28 27.215 34.892
29 120.356 124.340
30 8.656 10.561
31 18.996 19.429
32 0.541 2.212
33 7.459 9.128
34 2.730 4.494
35 21.369 25.379
36 1.017 2.043
37 10.826 12.178
38 79.474 81.756
39 0.466 1.686
40 1.526 4.418
41 0.946 1.544
42 0.385 0.972
43 3.816 5.118
44 3.320 5.727
45 5.890 7.475
46 4.012 5.758
47 10.873 12.565
48 1.554 3.780
49 3.285 9.361
50 1.942 4.884
51 3.006 5.080
52 0.894 1.508
53 0.741 1.818
54 4.990 8.529
55 0.724 1.403
56 5.036 7.167
57 6.109 8.491
58 3.430 4.521
59 26.509 27.612
60 7.257 9.501
61 3.056 6.141
62 2.779 4.038
63 1.879 4.765
64 11.704 28.918
65 3.920 5.362
66 1.629 8.578
67 1382.295 1379.958
68 3.045 5.198
69 7.111 10.689
70 5.992 6.805
71 3.560 6.693
72 38.113 39.607
73 1.799 2.840
74 32.279 33.930
75 21.565 29.151
76 1.550 4.514
77 3.917 8.667
78 1392.938 1431.461
79 18.727 21.594
80 6.433 13.057
81 9.889 11.282
82 28.270 29.588
83 4.068 5.807
84 1.807 3.201
85 2.433 5.647
86 0.684 1.599
87 63.438 64.373
88 24.798 27.863
89 0.927 1.940
90 0.648 2.492
91 1.976 3.140
92 0.414 2.424
93 0.372 1.183
94 32.121 35.912
95 152.090 152.829
96 0.765 1.766
97 8.628 9.905
98 1.443 3.888
99 6.189 7.854

References

[AL19] TPC Benchmark DS Full Disclosure Report, Scale Factor 10,000, Alibaba, http://www.tpc.org/results/fdr/tpcds/alibaba~tpcds~alibaba_cloud_e-mapreduce~fdr~2019-03-19~v01.pdf, March 19, 2019.
[AL19b] TPC Benchmark DS Full Disclosure Report, Scale Factor 10,000, Alibaba Cloud AnalyticDB, http://www.tpc.org/results/individual_results/alibaba/alibaba~tpcds~alibaba_cloud_analyticdb~es~2019-04-26~v01.pdf, April 26, 2019.
[DB17] J. Sompolski and R. Xin, Databricks, Benchmarking Big Data SQL Platforms in the Cloud, https://databricks.com/blog/2017/07/12/benchmarking-big-data-sql-platforms-in-the-cloud.html, 2017.
[Cis19] TPC Benchmark DS Full Disclosure Report, Scale Factor 10,000, Cisco, http://www.tpc.org/results/fdr/tpcds/cisco~tpcds~cisco_ucs_integrated_infrastructure_for_big_data_~fdr~2018-03-05~v01.pdf, March 5, 2018.
[Clo17] Apache Impala Leads Traditional Analytic Database, Cloudera, https://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/, April 25, 2017.
[GO18] Data Warehouse in the Cloud Benchmark, https://gigaom.com/report/data-warehouse-in-the-cloud-benchmark/, 2018.
[GO19] Data Warehouse Cloud Benchmark, GigaOm, https://gigaom.com/report/data-warehouse-cloud-benchmark/, 2019.
[MS16] SQL Server 30T TPC-H Result, http://c970058.r58.cf2.rackcdn.com/individual_results/Lenovo/lenovo~tpch~30000~lenovo_system_x3950_x6~es~2016-04-28~v01.pdf, 2016.
[MS17] SQL Server 10T TPC-H Result, http://c970058.r58.cf2.rackcdn.com/individual_results/Lenovo/lenovo~tpch~10000~lenovo_thinksystem_sr950~es~2017-07-09~v01.pdf, 2017.

memsql ribbon
Live Webinar
See a Demo of MemSQL & Kubernetes