While using REPLACE, can we get the current value of that row?

wfaheem · September 15, 2020, 3:10pm

We are creating a pipeline which is ingesting batch data into our tables. We are using REPLACE INTO syntax within. For every row, we want to update it’s one column which is for instance the count for the row while replacing it. Is there any way to get the current row column values before replace? As this is huge batch (30M rows) and goes into huge table(>150B rows) what would be the solution to update the count while each batch replace/insert. Thanks

hanson · September 16, 2020, 4:42pm

Consider using pipelines to stored procedures, which will let you use arbitrary logic during the load.

wfaheem · September 16, 2020, 5:01pm

we are using pipeline. In that we are using replace into query. we have following columns in it.

area_code
user_id
count

our pipeline ingest more than 20M data with area_code and user_id. we have to increase the count while replacing. The simple one is to fetch the existing count and plus one it. I was asking that if we can replace and count+1 in existing count value of that row?

evan · September 16, 2020, 10:25pm

maybe CREATE PIPELINE … ON DUPLICATE KEY UPDATE can help solve your issue?

REPLACE INTO will delete conflicting versions before installing new one; afaik there’s no way to read old values for REPLACE

wfaheem · September 17, 2020, 6:31am

@evan what if we aren’t using batching. it’s just a procedure which is running on a table after few intervals.

evan · September 24, 2020, 12:12am

the on dup key update pipeline feature should be orthogonal to the batching