Initial bulk load to maintain replica in memsql and incremental changes

sandeep · August 22, 2020, 8:33pm

Looking for guidance on Memsql capabilities as a beginner to understand. Memsql is proposed for Fast Data Access for in memory operations and same data is maintained in respective source DBs. Proposed solution is get the updated events through Micro services to Memsql after the initial load from oracle DB. Also there could not be any integration between source DB to memsql. explored whitepapers /blog to get the exact details and need assistance in this regard.

Initial data sourcing from Oracle exadata to Memsql. Is it done through load data utility with .csv? if yes, what are all options are available for each entity which is combination of multiple oracle source tables.
Incremental data sourcing in case a new attribute/colum is added to the model. How we have to reflect these new columns data which exists in Memsql DB.
How to reflect the data fixes applied at source DB in Memsql.

hanson · August 27, 2020, 7:03pm

For initial bulk load into MemSQL, if you are moving less than 3-4 tables, it’s usually easiest to just put the data in CSV files and bulk load it in to MemSQL or use MemSQL pipelines to get it in.

For larger numbers of tables and/or large data volumes, consider MemSQL Replicate (SingleStore Replicate · SingleStore Documentation). If you are a qualified sales prospect (and it sounds like you may well be) then contact your sales engineer or account executive to get it. If you haven’t gotten connected with them yet, send mail to team@memsql.com and they’ll introduce you.

For longer-term, ongoing change data capture (CDC) into MemSQL, most people are using custom code, application software-defined queues, sometimes using a tool like Kafka or other persistent queue product. But MemSQL Replicate can help with that in some scenarios (for paying customers – again, contact your sales rep.). And there are several CDC products out there that can help.

sandeep · August 28, 2020, 3:22pm

@hanson : Thanks for details and as i understand “Replicate” can use it for continuous data ingestion : Real time CDC events from Oracle to memSQL. Will explore the Pipelines to feed one time data and after that ,feed real events through kafka (via spark if needed )enablers to memSQL DB . Will get in touch with team experts if we may need further assistance in this regard.