Have you watched the video How to handle message retries & failures in event driven-systems? Handling retires with Kafka? , but you need more information. Hopefully this post helps.
Let’s say you have a consumer, a retry consumer, subscribing to a retry topic. When the retry consumer consumes the retry topic and retrieves message/s, the consumer must check if 5 minutes have passed since the upstream processor published the event.
What you would do is to take advantage of the Consumption Flow Control, which allows you to manually control the flow (https://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html). So you do the following steps
- Consume messages
- Check if the message published at timestamp is greater than the retry interval. If yes ( > 5 mins), then process, if not (< 5mins) continue to the next step
- Pause consumer
- Wait the time required
- Resume consumer (in some cases, you may need to close and start the consumer after resuming)
Note: you cannot pause processing without pausing the consumer or Kafka may think the client is in a faulted state and push the message to another consumer.
Also, remember the next message will always be published later, so if you are still waiting for the message at index 5, then index 6 will have a longer time to wait.
Regarding consumer lag, you would want a 5-minute consumer lag. Consumer lag showing that the consumer is processing events quicker than 5 minutes, shows the retry consumer is not waiting the designated time.
Lastly remember, you don’t want to introduce any delays upstream. So when the upstream event processor needs to retry, the processor should publish the problematic event without delay. All the pausing and waiting logic happens and lives in the retry consumer.