MemSql Kafka Pipeline stuck in a failed state when metadata changes

I have a MemSQL kafka pipeline that has been stuck in a failed state for over a day now.

Here’s some debug info:

  1. All of the Global Variables are still set to their defaults.
  2. Error message from PIPELINES_ERRORS table:
    Cannot get source metadata for pipeline <pipeline>. Stderr: 2019-09-06 08:52:08.970 Waiting for next batch, 255 batches left until expiration. 2019-09-06 08:52:08.971 Batch starting with new consumer. 2019-09-06 08:52:09.019 Failed to get watermark offsets from Kafka with error Broker: Not leader for partition
  3. Confirmed that the Kafka topic itself is healthy, server responsive and all of that aka I’m certain that if I drop & recreate the memsql pipeline, things will work just fine.
  4. Looking into the PIPELINES_ERRORS & PIPELINES_BATCHES_SUMMARY tables, it seems to be still retrying & failing constantly. Except that the NUM_PARTITIONS value is now 0 for every try.
  5. MemSQL Version: 6.8.7, Kafka Server Version: 2.2.0

Questions:

  1. Why is the memsql pipeline not able to reconcile the kafka topic metadata?
  2. What (setting?) do I need to change to make sure it auto-corrects when there is a leader change? I really don’t want to drop-recreate pipelines.

Thanks in advance.

Hello geet!

Can you provide the CREATE PIPELINE statement?

If possible, you can try to CREATE OR REPLACE PIPELINE ... which should keep the existing offsets for the kafka topic. Another option we can try is specifying CONFIG '{"kafka_version":"2.2.0"}', to rule out any protocol stuff.

You can also try debugging the pipeline in the foreground. Let’s try STOP PIPELINE ..., and then START PIPELINE <pipeline> FOREGROUND LIMIT 1 BATCHES. This should produce the error to the command prompt, rather than PIPELIINES_ERRORS table.

Another debugging option is set global pipelines_extractor_debug_logging=ON, followed by FLUSH EXTRACTOR POOLS. This will enable the kafka pipeline to produce debug output. You can see the output after a failed command (e.g. after running pipeline in the foreground) via SHOW WARNINGS;.

If possible, you can try to CREATE OR REPLACE PIPELINE ... which should keep the existing offsets for the kafka topic. Another option we can try is specifying CONFIG '{"kafka_version":"2.2.0"}' , to rule out any protocol stuff.

Yes, this resolves the issue but then this involves manual intervention and I was hoping for an automatic resolution. Please note that as kafka recovers and there are new messages to process the pipeline is auto-recovering. The issue is that, it seems to stay in the Failed state until there is a new message.

Another debugging option is set global pipelines_extractor_debug_logging=ON , followed by FLUSH EXTRACTOR POOLS . This will enable the kafka pipeline to produce debug output. You can see the output after a failed command (e.g. after running pipeline in the foreground) via SHOW WARNINGS; .

Will give this a try. Thank you.