Behavior Change Due to Unique Key Null Handling Bug Fix in 7.0

In MemSQL 6.8 and earlier, the code incorrectly handles NULL values in unique key columns for the case of a composite unique key, only in the case that the first column’s data type is defined to be NOT NULL but another column’s type is defined to be NULL.

In MemSQL 7.0 we found this bug and fixed it, changing the behavior to follow the SQL standard [SQL13]. Application software should not rely on the old behavior present in MemSQL 6.8 when upgrading to 7.0 or later.

To avoid confusion, MemSQL recommends that all columns of a unique key be defined as NOT NULL. With this setting, no NULL values will be present in unique keys. That will avoid any differences in behavior between versions 6.8 and 7.0 for unique key handling.

If you need to modify your application to accommodate this, you may need to use a zero, blank or the character string “empty” or some other value, instead of actual NULL values for appropriate unique key fields when you insert a record.

Behavior of MemSQL Before and After the Fix

Consider this table definition:

create table t(
  a int not null auto_increment,
  b int not null,
  c int null,
  d int not null,
  key(a),
  unique key uk (b, c, d),
  shard key sk (b, c, d));

Then run an insert into this table:

insert t(b,c,d) values(1, null, 1);

Then run it again:

insert t(b,c,d) values(1, null, 1);

When run on MemSQL 6.8, this second insert will result in an error like:

ERROR 1062 ER_DUP_ENTRY: Leaf Error (node-8d2448d6-09c8-490b-a072-dc34d38d846e-leaf-ag1-0.svc-8d2448d6-09c8-490b-a072-dc34d38d846e:3306): Duplicate entry '1-1' for key 'uk'

This behavior is not in accordance with the MemSQL design for unique key handling described later, which is based on the SQL standard [SQL13]. In MemSQL 7.0, no error will occur.

Multi-Column Unique Key Behavior With NULLS

Regarding unique keys, the SQL standard [SQL13] states:

A unique constraint specifies one or more columns of the table as unique columns. A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns.

Different database products have taken different approaches to the behavior of unique keys with nulls, and some have not followed the standard uniformly. We follow the approach where it is a unique key violation if all fields of a new key being inserted into a table have a non-null value and there exists another row with the same key already in the table. Otherwise the unique key constraint is satisfied.

The rationale for this is that with three-valued logic for comparisons (TRUE, FALSE, UNKNOWN), a comparison of two keys like

<x1,x2,...xK> = <y1,y2,...yK>

is expressed as

x1=y1 and x2=y2 and … xK=yK

If any of the comparisons of individual fields is UNKNOWN, the comparison result is UNKNOWN, and hence we do not signal a unique constraint violation. Comparing NULL to any value is UNKNOWN. So any NULL in any field of a key will cause it to not violate a unique constraint. MySQL uses a similar approach.

References

[SQL13] ISO/IEC 9075-1:2011(E), Information technology — Database languages — SQL — Part 1: Framework (SQL/Framework), 4.6.5.4 Table constraints, page 22, 2013.