What is the recommended way to stream database changes to a message queue?


#1

Does MemSQL support Change Data Capture (CDC) out of the box?

In many cases it is useful to stream database changes to another system with consistenty in mind. Apperantly, Pipelines does only support loading into MemSQL.

What is the best way to stream database changes to Kafka, RabbitMQ, ML, Pub/Sub etc.? With PostgreSQL this is possible using the Logical Replication feature. Does MemSQL have a similar feature or plans to support this?


#2

We are currently working on adding CDC functionality to allow changes from another system to be synced into MemSQL. Are you interested in streaming database changes from another system into MemSQL, or out of MemSQL into another database?

Let me know what your specific use-case is and I’ll see if I can provide some more information to help.


#3

We do actually not have that use case an longer as we migrated more PostgreSQL tables to MemSQL.

Because of some feature limitations in MemSQL we used PostgreSQL as our primary database to ensure strong consistency and synced changes to MemSQL and vice versa with some deamon scripts.

By refactoring the database design we could workaround the limitations and use MemSQL as the primary database in more cases.

I don’t remember the specific case, but I’m pretty sure we wanted to stream changes from MemSQL into external systems. I believe the idea was meant like a event/notify solution. A change in MemSQL should trigger some job in an external tool.


#4

I just remembered another case where we wanted to use CDC to ensure consistency. This use case is also related to an event or pub/sub solution.

When inserting some data into the table in a transcation, we want to post-process each record afterwards because its a slow step and would benefits of scaling.

With CDC we could feed a queue system like the way PostgreSQL does it with Kafka.

Our currently solution is having a state column indicating that the record needs post-processing. A daemon scripts fetches these records, processes them and updates the state. However, this doesn’t scale very well because we cannot lock the records (we could update the record with some time lock column to prevent other processes from fetching the same records - even better would be the SELECT ... FOR UPDATE SKIP LOCKED syntax).

Perhaps we actually doesn’t need a real CDC feature but more an event/notification feature. Unless, there exists some better best practies?


#5

Thanks for sharing the use case. We can definitely look into this type of event/notification and see if we can prioritize for future inclusion in our tools.