Is there a way we can access the Kafka offset Id of the message read from Kafka and store in table.
currently pipelines do not provide the Kafka offset id for each message, although pipelines can set the batch id on each record, and the information_schema.pipelines_batches will tell you the start and end offset for each batch/partition. the size of the batch/partition is also configurable, although for high ingest rates it would need to be set 100k+.
are you looking for data granularity for each row? what is the format of your data? is it CSV, JSON, AVRO? how many records per message?
thanks for trying out Kafka pipelines!
Input format is JSON.Thought process was that instead of logging a problematic json (validation errors) into table if only partition batch and offset can be logged. This would help keep error table light as data would be available at offset in Kafka untill it’s x day retention.
That’s an interesting idea. We’ll keep it for consideration.