Lock wait timeout exceeded; try restarting transaction

Today, we experienced a consistent lock timeout with two daemon scripts after running smooth for multiple months.

Both scripts was execution bulk operations and restarting both services individually didn’t solve the issue. After troubleshooting, we found that at least one specific row was casing the troubles.

However, no other processes (show full processlist) was apparently locking the row. At times, no other processes was running at all.

Without any further knowledge on how to trace the locking process, we restarting (and upgraded from 6.8.6 to 6.8.9) the cluster and the issue resolved itself.

In MySQL, I’m used to get further details of failing transactions with show innodb engine status. Does MemSQL expose a similar feature?

I would appreciate any recommendations on how to troubleshoot this further if the issue should happen again. At least, I’ll be glad to gather further details for you guys to explore in case this might be a bug of some forever-hanging transaction.

We are considering adding a management view like mv_lock_waits to show what’s waiting for what, but don’t have a firm timetable for that. You can get some waiting information using a query like the following, though I’m not sure this would have helped you.

SELECT
substr(replace(replace(query_text,'\n',''), ' ',' '),1,15) as q_text,
database_name,
(cpu_time_ms/(run_count+success_count+failure_count)) as avg_cpu_t_ms,
(cpu_wait_time_ms/(run_count+success_count+failure_count)) AS avg_cpu_w_t_ms,
(elapsed_time_ms/(run_count+success_count+failure_count)) AS avg_elap_t_ms,
(network_time_ms/(run_count+success_count+failure_count)) AS avg_net_t_ms,
(lock_time_ms/(run_count+success_count+failure_count)) AS lock_time_ms,
(run_count+success_count+failure_count) AS t_execs
FROM information_schema.mv_activities_cumulative JOIN information_schema.mv_queries
ON mv_queries.activity_name = mv_activities_cumulative.activity_name
WHERE activity_type = 'Query'
order by
avg_elap_t_ms desc
limit 20;