During my social encounters with other engineers, I’m always presenting myself as a software engineer who works a lot with Apache Kafka. Most of the time, I’m getting a response which kinda looks like everyone is copying it from somewhere: “I’ve heard about it but I can’t tell you what it is about”.

Honestly, I can’t blame those guys! If you look up right now “Apache Kafka” on any search engine you will find something like:
Apache Kafka … bla-bla-bla … distributed streaming platform .. yada-yada-yada … publish-subscribe …  high-throughput … etc.


So what is Apache Kafka?

The best analogy that I can come up with for Apache Kafka is a special postal service! Yeah, Apache Kafka is a postal service with two extra twists:

  1. It does not offer home delivery, the person who wants the “deliverable” will have to go to a post office to pick it up;
  2. All the letters, magazines, etc. will be copied and stored ( let’s ignore the privacy concern for now ) inside the post office for a while ( a week ).

Let me explain this by introducing Tom:

This is Tom!

Tom would like to send a letter to his friend, Jim, who lives in another city. Pretty simple so far, right? To do that he needs to go to the post office, hand over the letter and that’s it! Also, He has no idea when the letter will be received by Jim. 

On the other side, Jim knows that the postal service does not offer home delivery so he has to go every day to the post office close by him to check if any letter showed up for him. Once it is delivered, Jim can pick up his letter and happily read it.

The process of sending a “message” using  “Apache Kafka”

 

A postal service can handle more than letters…

The same principle would apply for other types of “deliverables” ( newspapers, magazines, etc. ). Each type of deliverable will be classified in a different category. Some companies would create them, send them to a post office, and picked up by some other parties. 

 

I get this whole sending letters thing but copying and storing them doesn’t make any sense! 

Well, let’s take the magazines example. Every day, Jim passes by the post office and picks up his favorite magazine. However, on the last couple of weeks, he was on a holiday so he kinda forgot which has the latest edition that he picked before.  So how does he know which edition should he pick now since he has returned from the holiday? Well, the post office keeps a ledger in-place where it is mentioned exactly which newspapers Jim has previously picked so he doesn’t end up with duplicates at home.

 

What if multiple persons are living in the same house?

Jim lives with Maria and sometimes Maria is picking up the magazine. The problem is that the ledger where the post office keeps the tracking is referring to a single entity. The easiest solution would be to extend the entity from one person to a group of persons. Let’s consider a group of persons who are living in the same house. Like this, even if both Maria and Jim are passing by the post office during the same day, only one of them would receive the magazine since the first person who walks by the office would be marked in the journal.

Multiple consumers acting as a group

 

Connecting the dots

Apache Kafka – the postal service
Kafka Broker – a post office
Kafka Cluster – all the post offices from the same country ( multiple brokers form a cluster )
Record  ( Kafka specific, also called “message” or “event” ) – a letter, a newspaper, etc. 
Topic – a category of records ( letters, newspapers, magazines, etc. )
Kafka Producer – Tom, a newspaper company, etc.
Kafka Consumer – Jim, newspaper readers, etc.
Consumer Offset – the current index represented in the ledger
Consumer Group – a group of persons who are living in the same house
Streaming – the operation of processing each letter individually as it arrives in the post office
Distributed – the postal service is composed of multiple post offices scattered across the country so letters, newspapers, magazines, etc. can be produced to different post offices