FullText Match Function not pushed to leaf table


#1

We’re experiencing some inconsistent errors with the FULLTEXT functionality.

The same query with same parameters sometimes gives the error Feature 'FullText Match Function not pushed to leaf table' is not supported by MemSQL Distributed., but just a few seconds later the query works fine.

When the error persist for some time, we tried to connect directly to another aggregator and the error disappears. Switching back to the previous failing aggregator, everything is fine again until suddenly and the same error comes back.

What is triggering this error? In some cases we can just change one of the aliases in the query to solve the issue. Other times we need to remove one or more JOIN clauses.

Our first thought was the LEFT JOIN until we noticed many successful queries with LEFT JOIN.

This is one of the queries failing from time to time:

select keywords.id, keywords.keyword, keywords.volume, priority, coalesce(cannibalization.count, 0) as cannibalization_pages from `keywords` inner join `keywords-v2` as `k` on `keywords`.`id` = `k`.`id` left join `page_keyword` on `keywords`.`id` = `page_keyword`.`keyword_id` and `page_keyword`.`page_id` = 15850902669311766255 left join (select page_keyword.keyword_id, count(*) as count from `page_keyword` inner join `pages` on `page_keyword`.`page_id` = `pages`.`id` where `pages`.`id` != 15850902669311766255 and `pages`.`site_id` = 11 and `page_keyword`.`priority` is not null group by `page_keyword`.`keyword_id`) as `cannibalization` on `keywords`.`id` = `cannibalization`.`keyword_id` where `k`.`location` = DK and `k`.`language` = da and (match(k.`keyword`) against (eyeliner)) order by `volume` desc limit 51 offset 0;

#2

Can you share the DDL for the tables involved in the query so we can reproduce the error?


#3

Can you get explain result of the query?
Meanwhile you can try following rewrite to see if it works:

SELECT keywords.id,
keywords.keyword,
keywords.volume,
priority,
COALESCE(cannibalization.count, 0) AS cannibalization_pages
FROM keywords
INNER JOIN
(
SELECT *
FROM keywords-v2
WHERE match(k.keyword) against (eyeliner)) AS k
ON keywords.id = k.id
LEFT JOIN page_keyword
ON keywords.id = page_keyword.keyword_id
AND page_keyword.page_id = 15850902669311766255
LEFT JOIN
(
SELECT page_keyword.keyword_id,
count(*) AS count
FROM page_keyword
INNER JOIN pages
ON page_keyword.page_id = pages.id
WHERE pages.id != 15850902669311766255
AND pages.site_id = 11
AND page_keyword.priority IS NOT NULL
GROUP BY page_keyword.keyword_id) AS cannibalization
ON keywords.id = cannibalization.keyword_id
WHERE k.location = dk
AND k.language = da) ORDER BY volume DESC limit 51 offset 0;


#4

Here is our DDL for the tables involved. We join the rowstore keywords with the columnstore keywords to get the updated volume. Updating the volume in the columnstore was quite slow in our testing.

CREATE TABLE `keywords` (
  `id` char(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `keyword` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `location` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `language` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `volume` int(10) unsigned DEFAULT NULL,
  `indexed_at` datetime DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `keywords_location_language_volume_index` (`location`,`language`,`volume`),
  KEY `keywords_indexed_at_volume_index` (`indexed_at`,`volume`)
)
CREATE TABLE `keywords-v2` (
  `id` char(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `keyword` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `location` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `language` varchar(5) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `volume` int(10) unsigned DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  KEY `language` (`language`,`location`) /*!90619 USING CLUSTERED COLUMNSTORE */,
  /*!90618 SHARD */ KEY `id` (`id`),
  FULLTEXT KEY `keyword` (`keyword`)
) /*!90621 AUTOSTATS_ENABLED=TRUE */
CREATE TABLE `page_keyword` (
  `page_id` bigint(20) unsigned NOT NULL,
  `keyword_id` char(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `priority` tinyint(3) unsigned DEFAULT NULL,
  `suggested_at` datetime DEFAULT NULL,
  PRIMARY KEY (`page_id`,`keyword_id`)
)
CREATE TABLE `pages` (
  `id` bigint(20) unsigned NOT NULL,
  `project_id` bigint(20) unsigned NOT NULL,
  `site_id` bigint(20) unsigned NOT NULL,
  `user_id` bigint(20) unsigned DEFAULT NULL,
  `url_id` char(32) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `url` varchar(4096) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `body` longtext CHARACTER SET utf8 COLLATE utf8_general_ci,
  `meta_description` text CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `meta_keywords` text CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `meta_canonical` text CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `meta_index` tinyint(1) DEFAULT NULL,
  `meta_follow` tinyint(1) DEFAULT NULL,
  `keyphrases` JSON COLLATE utf8_bin,
  `status` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL,
  `note` text CHARACTER SET utf8 COLLATE utf8_general_ci,
  `optimized_at` datetime DEFAULT NULL,
  `crawled_at` datetime DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `pages_site_id_url_id_index` (`site_id`,`url_id`),
  KEY `test` (`url_id`)
)

#5

The EXPLAIN fails with the query, but after 3 attempts with different search phrases (and nothing else) I got a successful execution.

Top offset:0 limit:51
GatherMerge [remote_0.volume DESC] partitions:all est_rows:51 query:[SELECT STRAIGHT_JOIN `keywords`.`id` AS `id`, `keywords`.`keyword` AS `keyword`, `keywords`.`volume` AS `volume`, `r1`.`priority` AS `priority`, COALESCE(`r2_2`.`count`,0) AS `cannibalization_pages` FROM (((`laravel_0`.`keywords-v2` as `k` STRAIGHT_JOIN `laravel_0`.`keywords` as `keywords` ) LEFT JOIN @ `laravel`.REMOTE(:: `laravel`.`r1`(0) AS SELECT (1!:>tinyint(3) unsigned NULL) AS `priority`,(1!:>char(32) CHARACTER SET utf8 COLLATE utf8_general_ci NULL) AS `keyword_id`) as `r1` WITH (table_convert_subselect = FALSE) ON (`keywords`.`id` = `r1`.`keyword_id`)) LEFT JOIN ( SELECT WITH(AGG_HINT_MERGE=true) `r2`.`keyword_id` AS `keyword_id`, CAST( COALESCE(SUM(`r2`.`count`),0) AS SIGNED) AS `count` FROM @ `laravel`.REMOTE(:: `laravel`.`r2`(0) AS SELECT (1!:>char(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL) AS `keyword_id`,(1!:>bigint(21) NOT NULL) AS `count`) as `r2` GROUP BY 1 ) AS `r2_2` ON (`keywords`.`id` = `r2_2`.`keyword_id`)) WHERE ((`k`.`location` = 'DK') AND (`k`.`language` = 'da') AND (MATCH (`k`.`keyword`) AGAINST ('test*')) AND (`keywords`.`id` = `k`.`id`)) ORDER BY 3 DESC LIMIT NULL OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)] alias:remote_0
Project [keywords.id, keywords.keyword, keywords.volume, r1.priority, COALESCE(r2_2.count,0) AS cannibalization_pages] est_rows:51 est_select_cost:706
TopSort limit:[?] [keywords.volume DESC]
HashJoin [r2_2.keyword_id = keywords.id] type:left
|---HashTableBuild alias:r2_2
| Project [r2.keyword_id, CAST(COALESCE($0,0) AS SIGNED) AS count] est_rows:352
| HashGroupBy [SUM(r2.count) AS $0] groups:[r2.keyword_id]
| TableScan r2 storage:list stream:yes est_table_rows:352
| Repartition [r0.keyword_id, count] AS r2 shard_key:[keyword_id] est_rows:352 est_select_cost:2,190 query:[CREATE RESULT TABLE r2 PARTITION BY (`keyword_id`) WITH (PARTITIONS=NULL, READERS=1, TAKEOVER_CONN=TRUE, BUFFERED_ROWS=NULL) AS SELECT WITH(LOCAL_SHUFFLE_GROUP_BY=1) STRAIGHT_JOIN `r0`.`keyword_id` AS `keyword_id`, COUNT(*) AS `count` FROM (@ `laravel`.REMOTE(:: `laravel`.`r0`(0) AS SELECT (1!:>char(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL) AS `keyword_id`,(1!:>bigint(20) unsigned NOT NULL) AS `page_id`) as `r0` STRAIGHT_JOIN `laravel_0`.`pages` as `pages` ) WHERE ((`pages`.`id` <> 15850902669311766255) AND (`pages`.`site_id` = 11) AND (`r0`.`page_id` = `pages`.`id`)) GROUP BY 1 OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)]
| ShuffleGroupBy [COUNT(*) AS count] groups:[r0.keyword_id]
| Filter [pages.id <> 15850902669311766255 AND pages.site_id = 11]
| NestedLoopJoin
| |---IndexSeek laravel.pages, PRIMARY KEY (id) scan:[id = r0.page_id] est_table_rows:470,067 est_filtered:31,176
| TableScan r0 storage:list stream:yes est_table_rows:365
| Repartition [page_keyword.keyword_id, page_keyword.page_id] AS r0 shard_key:[page_id] est_rows:365 query:[CREATE RESULT TABLE r0 PARTITION BY (`page_id`) WITH (PARTITIONS=NULL, READERS=1, TAKEOVER_CONN=TRUE, BUFFERED_ROWS=NULL) AS SELECT `page_keyword`.`keyword_id` AS `keyword_id`, `page_keyword`.`page_id` AS `page_id` FROM `laravel_0`.`page_keyword` as `page_keyword` WHERE ((`page_keyword`.`page_id` <> 15850902669311766255) AND (NOT ISNULL(`page_keyword`.`priority`))) OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)]
| Filter [page_keyword.page_id <> 15850902669311766255 AND page_keyword.priority IS NOT NULL]
| TableScan laravel.page_keyword, PRIMARY KEY (page_id, keyword_id) est_table_rows:105,937 est_filtered:365
HashJoin [r1.keyword_id = keywords.id] type:left
|---HashTableBuild alias:r1
| Repartition [page_keyword_1.priority, page_keyword_1.keyword_id] AS r1 shard_key:[keyword_id] est_rows:1 query:[CREATE RESULT TABLE r1 PARTITION BY (`keyword_id`) WITH (PARTITIONS=NULL, READERS=1, TAKEOVER_CONN=TRUE, BUFFERED_ROWS=NULL) AS SELECT `page_keyword_1`.`priority` AS `priority`, `page_keyword_1`.`keyword_id` AS `keyword_id` FROM `laravel_0`.`page_keyword` as `page_keyword_1` WHERE (`page_keyword_1`.`page_id` = 15850902669311766255) OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST)]
| IndexRangeScan laravel.page_keyword AS page_keyword_1, PRIMARY KEY (page_id, keyword_id) scan:[page_id = 15850902669311766255] est_table_rows:6,301 est_filtered:1
NestedLoopJoin
|---IndexSeek laravel.keywords, PRIMARY KEY (id) scan:[id = k.id] est_table_rows:6,379,362 est_filtered:6,379,362
Filter [k.location = 'DK' AND k.language = 'da' AND (MATCH (k.keyword) AGAINST ('test*'))]
ColumnStoreScan laravel.`keywords-v2` AS k, KEY language (language, location) USING CLUSTERED COLUMNSTORE est_table_rows:6,485,143 est_filtered:33,921

The query you posted seems to resolve the issue, however its a little hard to tell for sure as the issue is periodically. But after 10 executions it hasn’t failed a single time and I even tried to target both aggregators directly.

I’ll implement the changes in the application and run some more tests.

Thank you for your help so far.