Kafka Consumer – Committing consumer group offset
Consumer group offset – and introduction
What is a consumer group offset?
It’s a way for a consumer to keep track of where it is in the topic. For each topic and partition the consumer group will have one number representing the latest consumed record from Kafka.
When we wrote about Kafka topic we mentioned how each record in a partition has an offset -a sequential ID that marks its position within the partition and uniquely identifies it. The same concept is being used here. A consumer group will have one offset for each partition it is consuming. So if we have a consumer group that is reading from a topic with two partitions (let’s call them P1 and P2), the consumer group will have an offset for P1 – the last record read from P1 partition – and an offset for P2 – again, the last record read from P2 partition.
There are two ways to keep track of consumer group offsets:
- Storing them in Kafka, or
- Saving them in external storage (e.g. in a file or database)
This blog post covers the first approach – storing the offset in Kafka. The blog post describing the second approach is coming soon.
Where is offset stored in Kafka?
When using Kafka to store offsets, these are saved in a topic called __consumer_offsets (yes, those are two underscores at the beginning). Kafka client library provides built-in support for storing and retrieving offsets, ensuring that consumers resume from the last stored position after a restart.
What happens if there’s no stored offset for the consumer?
If no offset is found, the consumer follows the `auto.offset.reset` configuration to decide what to do next.
Consumer behaviour, depending on the auto.offset.reset
:
latest
(default) – Jump to the end of a topic and wait for new messages. None of the existing messages will be readearliest
– Jump to the start of the topic (will read all existing messages)none
– Throws an exception if no existing offset is found
Are you new to Apache Kafka? You can learn more – absolutely free!
Auto vs Manual Commit
What is an auto-commit?
This is a feature in Kafka client that simplifies the offset management by periodically committing the consumer offset to the Kafka cluster. The client is doing this in a separate thread and by default it is committing the offset every 5 seconds without the explicit action from us. This means that you can simply start consuming records and when your client restarts it will continue from the latest committed offset without you needing to do anything about it. This comes with some downsides we’ll mention soon.
The use of auto-commit is controlled by the enable.auto.commit
property, which defaults to true
whengroup.id
property is set.
while (true) {
records = consumer.poll(timeout);
// process the records
}
Downsides of using auto-commit
Based on how the auto-commit works, there is room for losing data when using it. Here’s how that might happen.
Since the client commits offsets in a separate thread, it does not track where your application is in processing polled records. In addition, since Kafka client is often polling multiple records at a time, what can happen is the following scenario:
- The client polls 10 records
- You start processing records and on the e.g. 3rd record the background thread commits the offset. It will commit the offset for the entire poll, which means that it will mark you’re currently on the 10th record.
- If your application now fails before processing all 10 records, you have effectively lost data because the next time your application starts it will receive the info it is on the 10th record and the next record to consume is the 11th record.
Simple way to avoid data loss is to only commit offsets after you have successfully processed them, which is why we have manual commit.
Manual commit
Manual offset commit allows explicit control over when offsets are recorded. By disabling auto-commit (enable.auto.commit=false
), the application must explicitly commit offsets using the commitSync()
or commitAsync()
methods. This level of control allows offsets to be committed only after specific conditions are met, such as successfully processing a batch of records.
while (true) {
records = consumer.poll(timeout);
// process the records
//when all the polled records are consumed, commit the offset
consumer.commitSync(); // or consumer.commitAsync()
}
When to rely on auto-commit?
If your application can tolerate:
- reprocessing the same data from time to time (because it’s designed as idempotent) and
- losing a few messages here and there
then this is a good feature to leave enabled. It reduces the client’s complexity.
Otherwise, use manual commit.
Synchronous vs Asynchronous Offset Commit
After choosing manual commit, the next decision is whether to commit offsets synchronously or asynchronously.
If you want to be absolutely sure the offset is committed before continuing your work, use commitSync(). Be aware that this will block your thread until the commit is confirmed by Kafka server or an exception is thrown.
Async offset commit has two variants, both of which are non-blocking:
commitAsync()
which ignores any exceptions thrown and
commitAsync(OffsetCommitCallback callback)
where you can provide a callback that will handle success or exceptions (eg timeout happened, or consumer group rebalance is in progress).
Conclusion
Consumer group offset is a fundamental aspect of how Kafka ensures reliable message consumption. It determine where a consumer resumes reading after a restart and play a critical role in data consistency.
While auto-commit simplifies offset management, it comes with the risk of data loss if records are not processed before a commit occurs. Manual offset management, on the other hand, provides greater control, ensuring that offsets are only committed after records are fully processed.
Choosing between synchronous and asynchronous commits depends on your application’s requirements—commitSync() guarantees persistence at the cost of blocking, while commitAsync() offers non-blocking flexibility with potential trade-offs.
Understanding these offset management techniques allows developers to design Kafka consumers that balance performance, reliability, and fault tolerance, ensuring data integrity in real-world applications.
Would you like to learn more about Kafka?
I have created a Kafka mini-course that you can get absolutely free. Sign up below and I will send you lessons directly to your inbox.