In a recent webcast, we shared tips and tricks for understanding MemSQL, best practices for implementation, and demoed MemSQL with a real-world use case.
Here are five top tips and tricks we shared:
When moving an application to, or creating one on MemSQL, start by thinking about whether rowstore or columnstore storage (or both) is ideal
If you have high throughput requirements, consider using MemSQL Pipelines
Take advantage of reference tables to eliminate distributed joins
Using and monitoring Management Views within the information schema is a great way to profile your cluster workload
We also took questions from webcast attendees, and here’s what we said:
Question: What if my use case doesn’t fit into rowstore or columnstore? What if it has some transactional elements and some analytical?
Answer: Some customers have a notion of “active” data that fits in memory, and once that data is expired or finished updating, it is moved to a columnstore table with the same schema. Ultimately, this is workload dependent. With our customers, we scope test workloads and look at the performance of queries on each model. Our sales engineering team members are experts in this domain.
Question: What are the benefits of the MemSQL columnstore compared to other Massively Parallel Processing columnar systems?
Answer: MemSQL columnstore facilitates real-time streaming data loading through in-memory optimized data structures, and caching metadata allows greater performance via segment elimination during query processing. We also use query vectorization techniques that take advantage of encodings and hardware instruction sets to get performance gains.
Question: Why are Pipelines the best way to ingest data into MemSQL?
Answer: Within MemSQL Pipelines, each leaf node partition in MemSQL has its own “worker” process to ingest data in parallel, as opposed to inserting data normally through an aggregator node. This allows the ingest to leverage the distributed computing power of the cluster and to run concurrently, in contrast to traditional serialized ingest methods.
Question: How do the shard key and sort key differ in practice?
Answer: The shard key applies to every MemSQL table and refers to the distribution of data among the leaf nodes in a cluster. The sort key only applies to columnstore tables, and facilitates better performance by aligning the ordering of data on disk with the ordering in which it is likely to be accessed by queries. Hence, it is necessary to look at the most frequently run queries on your table when determining the ideal ordering of fields in the sort key.
Question: What is the difference between a broadcast and a reshuffle operation?
Answer: A broadcast operation moves a set of rows cleanly across the network to each of the other nodes to complete a distributed join. In a reshuffle (or repartition) operation, one side of the join is reshuffled (has its shard key recalculated with a different set of key columns) to match the shard key of the table on the other side of the join. Each respective operation is useful in different contexts when optimizing the join performance of a workload.
Question: Why would I want to run PROFILE?
Answer: The query profiler will give you runtime statistics on your query, including operators, execution time per operator, and amount of data flowing through the network.
Question: How can I monitor the MemSQL SHOW STATUS EXTENDED or Management Views metrics?
Answer: Come to our next webinar, Operationalizing MemSQL, for tips on how to do so!
Missed the webcast? Get a copy of the recording to watch on demand here.
To learn more or ask specific product questions, reach out to firstname.lastname@example.org.