Unexpected detrimental index usage

alex · February 3, 2020, 8:05pm

By way of background, I have a tuple schema à la Datomic, with one table per value type:

e = entity ID (UUID stored as binary(16))
a = attribute ID (UUID stored as binary(16))
v = value (type depends on the table)
t = transaction ID (UUID stored as binary(16))
p = partition ID (mediumint).

So I had a query that included this:

select e, a, v 
from datoms_string
where (    p = ? 
       and e in (  convert(?, binary)
                 , convert(?, binary) 
                 , ...) 
       and a = convert(?, binary))

This portion of the query ended up using the avt index which required a scan of ~500K rows. At a higher level, I was looking to get the titles of a particular set of products. MemSQL ended up scanning all the titles of every product (a) rather than just looking at the particular set of 50 products I was interested in (e). The overall query took ~90 seconds after compilation. I thought for a bit and tried this approach:

(select e, a, v 
 from datoms_string
 where (    p = ? 
        and e = convert(?, binary) 
        and a = convert(?, binary)))
union all
(select e, a, v 
 from datoms_string
 where (    p = ? 
        and e = convert(?, binary) 
        and a = convert(?, binary)))
union all
...

This portion of the query ended up using the eavt index which required a scan of only ~500 rows (IIRC, all the rows associated with the products in question).

This brings me to my question: is there a way for MemSQL to automatically optimize for this kind of query? I’m using MemSQL 7.0 and have enabled analysis for all the tables in the DB.

Thanks so much!

(Also I got some MemSQL swag in the mail and I really appreciate the gesture! I’ve worn the socks already )

adam · February 3, 2020, 10:16pm

Hi Alex,

Is it possible for you to share the output of show create table datoms_string with us?

-Adam

alex · February 4, 2020, 1:02am

Sure @adam!

CREATE TABLE `datoms_string` (
  `p` mediumint(8) unsigned NOT NULL,
  `i` mediumint(8) unsigned NOT NULL,
  `e` binary(16) NOT NULL,
  `a` binary(16) NOT NULL,
  `v` longtext CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `t` binary(16) NOT NULL,
  `x` tinyint(1) NOT NULL,
  KEY `ieavt` (`i` DESC,`e` DESC,`a` DESC,`v` DESC,`t` DESC),
  KEY `iavt` (`i` DESC,`a` DESC,`v` DESC,`t` DESC),
  KEY `ivt` (`i` DESC,`v` DESC,`t` DESC),
  KEY `eavt` (`e` DESC,`a` DESC,`v` DESC,`t` DESC),
  KEY `avt` (`a` DESC,`v` DESC,`t` DESC),
  KEY `vt` (`v` DESC,`t` DESC),
  KEY `tt` (`t` DESC),
  PRIMARY KEY (`p` DESC,`i` DESC,`e` DESC,`a` DESC,`v` DESC),
  /*!90618 SHARD */ KEY `p` (`p`)
) /*!90623 AUTOSTATS_CARDINALITY_MODE=OFF, AUTOSTATS_HISTOGRAM_MODE=OFF */ /*!90623 SQL_MODE='STRICT_ALL_TABLES' */

adam · February 4, 2020, 3:24am

Thanks Alex,

Can you send us a profile of these queries? I see a runtime decision between three possible indexes in the EXPLAIN in either case (evat, primary key, avt) and those decisions are usually pretty accurate.

In case you’ve never looked at profiles in the past run:
PROFILE ;
SHOW PROFILE;

alex · February 4, 2020, 4:35am

Perhaps the issue was transient. Not sure if the data statistics automatically updated to make better runtime decisions or what, but what I’m seeing now on MemSQL Studio is that the two queries now take the same to execute and both use the eavt index, rather than the first using avt and the second using eavt. Wish I could have taken a snapshot of the profiling I did yesterday in MemSQL Studio (this is where I found this runtime index choice anomaly).

Happy to give you a profile if you like, but it doesn’t seem especially useful now that the query performance is the same. Apologies for not being more helpful.

adam · February 10, 2020, 11:13pm

No problem Alex,

The runtime decision I was referring to is based on a quick probing of the indexes to get an idea of which one is more selective (each index is a skiplist so we can probe the higher towers to get an idea of how many rows the scan will return). Its not using histograms.

-Adam