Error 1749 (hy00)

nhas9102 · June 28, 2020, 6:32am

Hi, I’m continually having problems with running distributed queries from the Ch-Benchmark…

select  i_name, substr(i_data, 1, 3) as brand, i_price, count(distinct (mod((s_w_id * s_i_id),10000))) as supplier_cnt from  stock, item where  i_id = s_i_id and i_data not like 'zz%' and (mod((s_w_id * s_i_id),10000) not in (select su_suppkey from supplier where su_comment like '%bad%')) group by i_name, substr(i_data, 1, 3), i_price order by supplier_cnt desc;

This query raises ERROR 1749 (HY000) at line 1: Feature ‘Uncorrelated subselect IN-list that is not at bottom of distributed query tree’ is not supported by MemSQL Distributed.

However, if I run the same query with a different partitioning scheme, it works perfectly in a smaller test dataset.

I’m just really confused right now.

Only difference between the two schemas is that in my bigger dataset, the supplier relation is sharded but it’s replicated in the smaller one.

+---------------+--------------+------------+------------+--------------+------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | NON_UNIQUE | INDEX_SCHEMA | INDEX_NAME | SEQ_IN_INDEX | COLUMN_NAME | COLLATION | CARDINALITY | SUB_PART | PACKED | NULLABLE | INDEX_TYPE | COMMENT | INDEX_COMMENT |
+---------------+--------------+------------+------------+--------------+------------+--------------+-------------+-----------+-------------+----------+--------+----------+------------+---------+---------------+
| def           | tpcch        | customer   |          1 | tpcch        | c_d_id     |            1 | c_d_id      | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch        | neworder   |          1 | tpcch        | no_d_id    |            1 | no_d_id     | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch        | neworder   |          1 | tpcch        | no_d_id    |            2 | no_w_id     | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch        | orderline  |          1 | tpcch        | ol_d_id    |            1 | ol_d_id     | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch        | orderline  |          1 | tpcch        | ol_d_id    |            2 | ol_w_id     | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch        | orders     |          1 | tpcch        | o_d_id     |            1 | o_d_id      | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch        | supplier   |          1 | tpcch        | su_suppkey |            1 | su_suppkey  | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch_test   | orderline  |          1 | tpcch_test   | ol_o_id    |            1 | ol_o_id     | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch_test   | orderline  |          1 | tpcch_test   | ol_o_id    |            2 | ol_w_id     | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch_test   | orderline  |          1 | tpcch_test   | ol_o_id    |            3 | ol_d_id     | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch_test   | orders     |          1 | tpcch_test   | o_id       |            1 | o_id        | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch_test   | orders     |          1 | tpcch_test   | o_id       |            2 | o_w_id      | NULL      |           0 |     NULL | NULL   |          | SHARD      |         |               |
| def           | tpcch_test   | orders     |          1 | tpcch_test   | o_id       |            3 | o_d_id      | NULL      |           0 |     NULL | NULL   |          | SHARD      |

hanson · July 2, 2020, 12:57am

What version of MemSQL are you running?

Consider also a query rewrite to turn the subquery into just a regular join and use NOT LIKE instead of LIKE. Maybe that can be made to work.

E.g.

...
from stock, item, supplier
where ...
(mod((s_w_id * s_i_id),10000) 
    = su_suppkey and su_comment not like like '%bad%' 
...

jack · July 7, 2020, 6:58pm

We investigated this and found that a new optimization added in 7.1 accidentally prevented us from being able to run this query shape. The issue was for certain shapes of IN subselects. The query runs fine on 7.0 and earlier versions, and we have made a fix to enable running the query again which will be available in future versions of MemSQL.

nhas9102 · July 13, 2020, 9:42am

Was this a bug in the code?

jack · July 13, 2020, 5:29pm

No, not a bug per se, but a new optimization interfered with the ability to execute particular query shapes of IN subselects.

jack · July 13, 2020, 5:32pm

The fix will be in the next patch release (7.1.5).

nhas9102 · July 13, 2020, 5:33pm

Thanks for all the help.

omar.ramos · August 13, 2021, 11:02pm

Not sure if this question will revive this thread or not, but quite often for queries against our Oracle database queries like this are constructed:

select  sum(ssbsect_enrl * a.scbcrse_credit_hr_low) as "Credits"
         from    ssbsect,
                 scbcrse a
         where   ssbsect_term_code = :term_code
                 and ssbsect_subj_code = a.scbcrse_subj_code
                 and ssbsect_crse_numb = a.scbcrse_crse_numb
                 and a.scbcrse_eff_term = (select max (b.scbcrse_eff_term)
                                           from scbcrse b
                                           where 	a.scbcrse_subj_code = b.scbcrse_subj_code
                                                  and a.scbcrse_crse_numb = b.scbcrse_crse_numb
                                                  and b.scbcrse_eff_term <= ssbsect_term_code);

With that last nested query being the correlated subquery that makes sure only the actual rows needed get included.

Over the years, we’ve had a difficult time from not using this approach since it appears to work (although in certain cases it can be quite slow…in the example above running the query for a current term code results in a query time of about 4 seconds).

A few years ago I first started experimenting with MemSQL (now SingleStore) and came up with an (naive/simple) approach to sync over data from our Oracle database into a MemSQL database.

Even though I’ve kept it online (and recently updated it again to 7.5.7 and finally turned off the Table Name Case Sensitivity option, which was another item that prevented our existing Oracle queries from being used in a “drop-in” fashion) I was recently making some attempts to run a larger query that included the query above within it and noticed the Error 1749 listed (" Feature ‘Correlated subselect that can not be transformed and does not match on shard keys’ is not supported by SingleStore Distributed.") and it’s made me wonder if moving forward there will be anything enabled within SingleStore that would make this sort of issue go away in the future? (These tables are all using the “rowstore” option so they can be located in memory, rather than the new default which I think is to use the “columnstore” option instead since a lot of new efforts have gone into that option, so I’m not sure if that might be a factor as well).

One extra comment I can share about my naive Oracle syncing approach, which might factor into this issue occurring (although I hope it’s not actually the main reason for it), is that I’m not really marking off the full set of exact primary/composite keys (in our database, the tables in recent years had a new column added with an incrementing primary key type value added as an additional key so I used that to simplify the syncing approach I came up with, and just add that one field as a key, rather than the other primary / composite keys that might be present in a table).

nlello · October 20, 2023, 3:08pm

I’m hoping this re-opens this topic; we’ve just hit this exact problem in 8.1.24 … is the same issue back; and is it fixed in a point update ?

hanson · October 20, 2023, 8:40pm

This is a limitation in the query processor. I’ll raise it again with our query processing engineering team. Your best bet at this point is some kind of manual query rewrite to avoid it.

hanson · October 20, 2023, 9:48pm

@nlello What schema and query are you using?

There’s a good chance you can make the error go away by adjusting your sharding. Typically you should try to shard on the join columns.

nlello · November 7, 2023, 11:53am

@hanson Apologies for not responding… we reverted the table types to get this working; and have decided to just test when we upgrade.