yes I should have added more information.
It is happening intermittently and mainly when the batch size is big (1 000 000). When I reduce the batch size, the performance is degrading but the problem is less frequent.
The two pipelines are loading data into a dedicated store procedure.
Here is the process
I have two kafka topics containing de-normalized data:
first topic => assays (name, description, metadata), I am using a store procedure because I need to create tri-grams in another table for each assay name.
second topic => experiments (experiment_identifier, experiment_date, assay_name)
I would like to use a proper id in the experiments table and not repeating the assay name
Also because the two pipelines are running concurrently, I don’t know when the assay will be ready in the assay table.
My strategy is to create the record in the assay table from both pipelines.
If the experiment flow is first then the insert will generate an id and the record with only contain the assay name, then the second pipeline will update the record and will add the other fields.
If the assay flow is first then the pipeline will insert a full record in the assay table and the second pipeline will just pick the id.
The locking happens when the same record is inserted / updated in the same time.