MemSQL bad generated keys for auto_increment concurrent requests

lucian.baciu · December 6, 2019, 12:53am

Hello everyone,

The problem occurs when I want to make a BATCH INSERT , having rewriteBatchedStatements=true

If I have 2 threads that make a 1000 rows insert each, when I read the generated keys from the PreparedStatement (I’m using Java), there are a lot of duplicated ids. Theoretically the union of the generated ids of both threads should be 2000 distinct numbers (from 1 to 2000, if the table is new), but the union of distinct ids received are never 2000, but some value between 1001 and 2000.

I tested this with MySQL and it works well there.
If rewriteBatchedStatements is set to false, it works well in MemSQL too, but it is much slower.

I’ve been struggling a lot with this issue. If you know some tips or reasons why it is working this way, please let me know.

MemSQL version: 6.8.2
My table: (id int primary key auto_increment, value varchar(50) NOT NULL)
My query: INSERT INTO test (value) VALUES (?)

Saksham · December 6, 2019, 1:51am

Do you mind sharing your MemSQL version, your Table DDL and the INSERT queries you run?

lucian.baciu · December 6, 2019, 2:05am

Sure, just edited the post

jack · December 6, 2019, 5:29pm

Having duplicate ids should be impossible both from the auto_increment and from the primary key. Can you show an example of two rows with duplicate ids? Can you also check the number of rows inserted - perhaps it inserted less than 2000 rows for some reason?

lucian.baciu · December 10, 2019, 11:03pm

Hi Jack. In memSQL, there are exactly 2000 rows, with good ids. The ‘duplicated’ ids combes back while reading the ResultSet for generatedKeys. The way I take the ids is the following:

preparedStatement.executeBatch();
ResultSet rs = preparedStatement.getGeneratedKeys()
while (rs.next()) …
Having 2 threads that do this in parallel, there are many ‘generated ids’ shared across the threads. If id X is received by thread1, X shouldn’t appear anymore in the ResultSet of thread2

lucian.baciu · December 10, 2019, 11:31pm

I changed to MariaDB connector instead of MySQL connector, and it seems it work fine now

jack · December 11, 2019, 2:07am

That sounds like potentially a client-side issue. If you reproduce the problem you observed and query the database for the actual data inserted, you should be able to confirm that there were no duplicates. Perhaps the 2 threads are actually reading from the same result set for some reason?

lucian.baciu · December 11, 2019, 10:46am

I think something in the driver was causing the problem.

After I put mariaDB though, rewriteBatchedStatements=true is not working anymore. After inserting 2 batches x 1000 rows, using: insert into test (value) VALUES (?), I see memSQL ran 2000 queries, using

select query_text,success_count,failure_count from information_schema.mv_activities_cumulative join information_schema.mv_queries using (activity_name) where last_finished_timestamp > now() - interval ‘1’ minute;

Instead I should see 2 queries (rewrote)

jack · December 11, 2019, 5:56pm

MariaDB Java Connector Driver Performance | MariaDB mentions that when using rewriteBatchedStatements=true

Auto increment ids cannot be retrieved using Statement.html#getGeneratedKeys().

But you said you are using getGeneratedKeys, so that also seems to indicate that the driver isn’t actually batching the statements for some reason.

FYI you can log the queries that the driver runs against MemSQL with the general_log option: https://docs.memsql.com/v7.0/guides/cluster-management/troubleshooting/trace-log/#logging-queries.