Deadlocks / rollbacks

Hi there!

Our team is working through an issue where we seem to be experiencing data being committed from a transaction after a deadlock has been encountered and the transaction has been rolled back. We’re using MemSQL 5.7 (update: sorry typo’d this – we are on 6.7) and columnstore tables. The deadlocks occur after we’ve performed large batch inserts and then attempt a few delete operations using a join on a temporary table.

Is there any scenario you’re aware of where this could be possible? Any race condition, database configuration, etc. that could result in this?

We’ve been scouring our logic and the underlying JDBC connection logic, but there’s nothing that seems to stand out or that we can reproduce.

Thanks for any advice or guidance!

Hi Aaron,

Thanks for reaching out. Unfortunately, MemSQL 5.7 is a very old version. Our latest GA release is already at 7.0. Is there are reason for you to be still in that very old version? If not, I would highly recommend you to upgrade to the latest version. And if you still face the issue, we can then investigate the same for you.

Please let us know if the above suggestion works for you.

Thanks,

Sorry I mistyped earlier. We are on MemSQL 6.7.

Hi Aaaron,

Do you have access to MemSQL support? It maybe faster to gather more details there if so.

inserts\load data\insert…select\etc. will always rollback completely on a deadlock. I suspect what your seeing here is that deletes in memsql will do local commits on each partition as they complete if run outside of an explicit transaction (no BEGIN statement run before the DELETE). That means on deadlocks, some partition could have already committed the delete. Does this seem like a possibility in your case?

Thanks adam – I believe we have MemSQL support, so I’ll try that route.

In our case, everything is done is a single, large transaction – and we have multiple processes running those large transactions. For each individual transaction, we insert about 100k records (spread across multiple INSERT statements), create temporary tables, delete rows from existing tables, etc. That all happens within a single BEGIN...COMMIT. At the same time that’s happening, there are also other operations happening on those tables (selects, optimize commands, etc.).

I’ll follow up with support and will continue to dig, but if you have any thoughts in the meantime, that’d be helpful. I’ve been able to confirm in our data that the same record is being committed multiple times and our logging seems to indicate that we’re calling rollback when this deadlock is encountered. We seem to only see duplicate records when a deadlock is encountered.

For what it’s worth, it definitely seems like the obvious thing here is that the application logic isn’t rolling back and is accidentally committing the large transaction when it hits a deadlock – but I’ve been unable to see that happen or trace how that could happen in the underlying jdbc / scalikejdbc code :confused:

If everything is inside of a transaction it should rollback cleanly on a deadlock error. The deadlock error will only rollback the changes made by the query that hit the deadlock (not any writes done by earlier queries), so you do have to be careful not to commit after a deadlock error like you mentioned.

One thing to be careful with, is that I believe CREATE TEMPORARY TABLE will automatically commit any open transaction in MemSQL versions before 7.0 (like other table DDL does). In 7.0 CREATE TEMPORARY TABLE is allow inside a transaction.

2 Likes

I was just starting to look at that piece! That’s it – everything that’s inserted prior to the create temporary table command gets committed when we rollback from a deadlock that happens after the create temporary table command. Mystery solved!

Glad to hear that it’s actually transactional in 7.0.

Thanks for your help!

1 Like

Awesome. That is quirk we picked up from being MySQL compatible. We’re moving away from some of these behaviors that are a bit error prone like this one (a better behavior would be to error out) - I made sure we have a task tracking this.