This case study was first published as a video with a brief description. Noah Zucker, Senior Vice President for Technology at Novus Partners, talks about how they use MemSQL for portfolio intelligence, applied to their $2 trillion – yes, two trillion dollars – in assets under management. The case study has proven popular, and the content is still highly relevant today, so we’re releasing a transcript of the content as a blog post.
I’m here today to talk about Novus Partners and how we’re using MemSQL to change how the world invests.
What is Novus Partners? We are a portfolio intelligence company. We provide a platform used by over 100 investment managers – that’s hedge funds, pension funds, large allocators, home offices – to gain better insights on their investments and to better understand risk. Our platform currently encompasses over two trillion dollars in assets under management.
In addition, we have a research platform that our clients use to explore investments from publicly sourced data. Any hedge fund that’s large enough has to file 13F data about their investments. Our users can log in and explore and understand other hedge funds out there, where the risk is, and get trading ideas from that.
How Novus Partners Helps Investors
Essentially, our mission is to help investors discover their true investment acumen – where their true strengths are, and also understand their risk. Our users log in to the platform, the Alpha platform, and they are presented with a series of pages of interactive graphs, charts, and other tools they can use to explore their investments and get deeper insights than they previously had when they were just looking at things in spreadsheets, or just maybe looking at a graph with just simply returns from the last 10 years.
You know, lots of these hedge funds have glossy brochures where they show how they beat the market from the last 10 years, but it doesn’t show the deeper picture of where they got those returns from. Do they actually have high returns from last quarter – but also they have a large risk, like an illiquid position, or their investments are in some area that they probably didn’t truly understand.
So our users are able to gain deeper insights and bring a kind of moneyball approach to investing, whereas the hedge fund industry traditionally has been more sort of gut instinct investing.
Here are some of our clients that use the platform today, who use Novus. Some of the top investment managers in the world.
Before MemSQL: ETL Headaches
Let’s talk about the story before MemSQL, because Novus Partners didn’t always use MemSQL as its main database backing our investment analytics platform. Before MemSQL, we used MongoDB, and when I joined Novus in 2013 I immediately saw that we had some problems.
You know, we have a client data team. That’s our team that works with our investor customers. The members of our team are very skilled portfolio analysis analysts themselves. They understand investments. They understand the data.
But they were spending most of their time not actually doing that, but managing our ETL (extract, transform, and load – Ed.) pipeline. What that meant was they had a 24-7 operation. They had to babysit, handhold the ETL process loading the metrics into our platform.
If there was a job failure then they’d have to spring into action, shuffling around their data load schedule. In the worst case they’d have to load a large job during the day and that would mean an application slowdown for all of our users while the database was under strain.
So, being the new guy, I asked why we only have 12 of these compute nodes implemented in Scala? Why can’t we just put them out on the cloud, just scale out, and have one 100 of these just blasting through all that data?
The answer I got back was a little bit interesting. You know actually what I was told was they tried doing that, they tried scaling up, but they really couldn’t go much higher because the database that we’re using, MongoDB, just couldn’t keep up. So it got to a point where we had to actually investigate making a change.
We either had to learn to scale out our existing, do the work to scale our existing database, or we had to investigate using something else. And of course, that was an opportunity to investigate other technologies, and MemSQL was one of those.
You know, one reason why we decided to make a change was that, using Mongo, there are well understood ways to scale out, and it basically would be a full re-write of our application. You know, we’d have to revisit our data model, introduce sharding, and as an application developer now you’re having to think about scalability and that sort of stuff alongside your business logic. So that is a big undertaking.
MemSQL Cuts Load Times by 98%
So this is where MemSQL comes in. This is what our actual data pipeline looks like. So as I mentioned our clients, they provide us with data in all sorts of formats, flat text files, Excel spreadsheets, even PDF. We scrape data off PDF format and we load that through a pretty standard ETL process, just data clean-up, and it’s stored in a persistent store of record.
Then our platform takes that data out of our store of record, sends it into our Scala-based distributed compute layer, and that does the computations of the portfolio analytics and the metrics that I referred to earlier. It caches that in MemSQL so that, when our customers log in, all that data is available to them at their fingertips.
From their perspective these high intensity computations are being done immediately, something that they’re not able to do without us. They might be waiting minutes or hours if they’re trying to crunch those numbers in a spreadsheet or on a traditional database. So that’s of great value to them. And the bottom line for our ETL team was that a typical hedge fund data load went down from 90 minutes to two minutes.
So, even if there’s a failure, we can just re-run it. It doesn’t cause a load on our system inter-day. And from a developer’s perspective – and actually, I’m a Scala developer myself, not a data engineer – so from my perspective, MemSQL brought a lot of value.
“The Learning Curve is Basically Non-Existent”
Now you hear that we’re moving to a new database and the first thing you want to know is like, what’s its interface? And the answer you get back, it just uses the MySQL interface. I think that’s something overlooked perhaps in the MemSQL buzz, but as a developer, that’s a huge win.
The learning curve is basically non-existent. You have the whole tool chain available to you. There’s a lot of documentation online. So that’s a great value as a developer.
In addition, MemSQL has first class JSON support. Being a Mongo shop, that’s really important, because we did have a stable schema at the time that we were doing the migration. We were able to map a lot of our data to a relational schema, but there are parts that we had to leave in JSON – or maybe that we were iterating quickly over it.
We want to do more rapid development and it’s changing. So we want to leave it in JSON format and so that means that there’s a whole lot of code that we don’t have to re-write, that we can just leave as is.
Novus is also known for this open source library developed at Novus, the Scala case class serialization library, that works with Mongo, Salat, and we could just leave a lot of that code in place. Moving from Mongo to MemSQL didn’t mean we had to scrap a lot of code. We just left a lot of it as is. So that was a big win for us.
So the bottom line in terms of what the impact was of MemSQL for our business, that client data team that I mentioned earlier, they’re focusing more on delivering value to our customers, helping them understand, and their data, and their investments during the integration, the data integration process.
You know, when we have a failure inter-day we don’t have that data application slowdown. So that’s important. Our end users don’t have these bad days where things are going slow because we had to run a big job during the middle of the day. Our architecture is not limited by the database at this point.
10X the Workers – with No Code Changes
You know, we are able to scale up from just 12 workers to 126. So that’s over a 10X improvement. And, if we want to scale even further, with MemSQL we can just add more servers and we’re not limited by our database architecture.
As an application developer I don’t even have to think about changing my code for this increase in scale. I don’t have to revisit the data model. I just have a set of really well written SQL queries, well indexed.
I just ask the operations team, “We need to add more servers,” and they get on provisioning those and now we have more scale. So it’s really convenient.
Now with Half the Operations Workload
One more thing to mention, from a system administration perspective. Before, on Mongo, we had two full-time sysadmins and an architect putting in a significant amount of time during their week, just the care and feeding of that Mongo operation.
If we had scaled out Mongo, that was going to potentially be more of that type of work. But with MemSQL we now actually have just one DBA and an architect, maybe a few hours a week, if that, working on MemSQL.
For a small company like Novus, less than 100 employees, it really fits our operational model. We don’t have to devote a whole team just to the care and feeding of this data platform. It pretty much takes care of itself, once we have it configured and set up.
You can try MemSQL for free today.