We are thrilled to have Carlos Bueno, Product Manager at MemSQL and an expert in performance tuning, as the leader of many of our training efforts. Prior to working at MemSQL, Carlos held critical engineering roles at Facebook, Yahoo, and several startups. He is the author of “Lauren Ipsum”, a popular children’s novel that introduces kids to computer science, as well as “Mature Optimization”, Facebook’s manual on performance measurement and optimization.
We sat down with Carlos to talk about his two major passions: finding new ways to make use of data, and building easy-to-use technical training modules.
Q: Tell us about your experience working with big data across various roles.
Carlos: Some industries stumble upon future realities sooner than others. In the early 2000s, I was in charge of operations for a large (for that time) search advertising platform. 21st century advertising evolved very quickly into a cutthroat game of who could ingest, process, and act on relevant data faster than anyone else. This opened my eyes early on to what Peter Norvig calls the “unreasonable effectiveness of data“. Since then, I have gone where the data is, which is what led me to Facebook, to help the company build a sustainable business with the most efficient code and datacenter operations possible. Now I’m basically an arms dealer for data scientists.
Q: How did you develop an interest in technical training?
Carlos: Years ago I started writing longform technical articles for the engineering blog at Yahoo, and then I did the same for Facebook, A List Apart, Hacker Monthly, and others. At Facebook I was “volunteered” to teach a regular bootcamp class on performance profiling, and discovered that I really liked the energy of a live audience. Later I helped design their week-long “Datacamp” program. I wrote a book form of my class called “Mature Optimization”, mostly as an excuse to give talks outside of the company.
Q: What makes teaching people how to use MemSQL fun?
Carlos: MemSQL represents a new kind of database, one that allows you to do things that had been so difficult for so long that people think they are impossible. The design of the system is a mix of new algorithms (e.g. skiplists) and almost retro ideas like SQL that turn out to be superior. It’s fun to dig into the mechanisms and implications of all that, how it all fits together.
Q: What do you think is the most interesting aspect of MemSQL?
Carlos: I love teaching people about small, powerful ideas that are not widely understood. Currently that’s geospatial intelligence. I didn’t know anything about how geospatial worked a couple of years ago. I’m still amazed at how just a few simple operators can enable things like our Supercar demo, which shows real-time cab rides and fun statistics around those rides.
Q: What is your favorite question to receive in a training session?
Carlos: Every class is different, but usually there’s a question about performance. Too often people treat performance problems like code bugs. They try to reason them away with a little mental model of what the system is doing. This can lead you astray if you don’t test your intuitions. So if someone asks “what is the performance of such-and-such kind of disk?” the correct answer is “I don’t know. But if you try this or that and measure these things, you’ll quickly find out the answer for your hardware and your particular data”. I like putting the science back into computer science.
MemSQL 4 Training Video Series
Now it is time to introduce the first MemSQL video training series, led by Carlos! The MemSQL 201 playlist has 6 modules. Watch the full playlist or choose the modules that most appeal to you.
Module 1: MemSQL Architecture Overview
In the first module, Carlos describes MemSQL and its main database characteristics, including technical principles, architecture, design and implementation.
Module 2: Durability, Backups, Replication, and High Availability
What happens when an in-memory database reboots? How is data replicated? In module 2, Carlos walks through what happens when MemSQL is deployed on machines, and how MemSQL demonstrates the four essential qualities of any leading database: Atomicity, Consistency, Isolation and Durability.
Module 3: MemSQL Geospatial
MemSQL supports common datatypes like strings, dates, numbers. MemSQL 4 also supports geospatial data types. In module 3, Carlos explains how geospatial data is stored in MemSQL, relational functionality, and spatial indexing for fast queries.
Module 4: Columnstore
In Module 4, Carlos explains the structure of the MemSQL columnstore, which compresses and processes data on disk. The MemSQL columnstore is ideal for storing data that exceeds your memory capacity. Infinite disk storage on MemSQL is provided for free.
Module 5: Moving Data
In Module 5, Carlos describes ways to move data in and out of a MemSQL cluster. He covers Hadoop/HDFS, Amazon S3, MySQL, flat files, and ODBC data sources.
Module 6: Schema Design and Capacity Planning
In Module 6, Carlos dives into the MemSQL in-memory rowstore, explaining data types, indexes, sparnesses and JSON, online ALTER TABLEs, and sharding strategies. He concludes with advice for planning a cluster based on various data workloads.