Enterprise Data Lake & Backups

Currently, MemSQL does not shard databases consistently such that cross-database queries are not optimized. (Is this an existing feature request?) Therefore, tables that will be queried together should coexist in the same database. With an enterprise data lake use case, where various data sources across the enterprise need to be queried together, a single database can grow quite large. This can become problematic given that there are currently only full database-level backups, as they will grow increasingly large and slow. Are there any ways to mitigate this problem?

EDIT: I just saw a post indicating that MemSQL 7.0 will support incremental backups. Are there any other ways this problem?

We’re not currently considering a feature to allow two different databases to be sharded the same way, because our architecture relies on databases being physically independent of each other. E.g. they can have different numbers of partitions (shards). If you have experienced a significant performance problem related to inability to get co-located joins across databases, I’d be interested to hear the specifics. We have feature requests open for incremental backup (coming in 7.0), and ability to back up individual tables separately.

I can see why databases with different partition counts wouldn’t be sharded the same, but when they do have the same partition counts, why not? It would allow developers to better organize their data into more databases and have smaller backups while maintaining the same performance. Alternatively, it’d be nice to have schemas or some way of subdividing databases and being able to perform backups and restores at the level.

I haven’t tried large-scale cross-database joins specifically because I’m trying to create designs that avoid the problem altogether. I can’t take the risk of building a design like that only to find that it doesn’t perform well or consumes too many resources, which would undermine the benefit of using MemSQL.

Makes sense. I recorded a feature request for allowing co-located joins across DBs if the number of shards are the same and the tables are shard-aligned. Seems feasible.

1 Like