Kafka Fundamentals

What is Kafka

"Kafka is a distributed, partitioned, replicated commit log service" - Fault tolerant - Near linearly scalable (horizontally) - Durable - Often used as a publish-subscribe messaging system
We get a sequence of messages from a Kafka topic.
Topics can be consumed in batches.
Buffer can be set by either size or time.
consumer offset
consumer offset
The gap between consumer offset and latest message offset is called Lag.
notion image
👉🏼
Kafka guarantees ordering only in a partition not in a topic.
If key is set by the publisher then, Kafka does the modulo operation to find the partition number to which the message will get published.
notion image
Each node of Kafka called as broker. Multiple brokers are called cluster.
notion image
notion image
ZooKeeper is not mandatory. On fault it decides who will be the next leader.

The Name

Thought that since Kafka was a system optimized for writing using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project. So basically there is not much of a relationship. Jay Kreps, Lead Developer at Linkedin
notion image
notion image
For P2P (Point to Point) Messaging it is not ideal.
Publisher-Subscriber illustrated
Publisher-Subscriber illustrated

Why Kafka

- Able to connect a large number of clients (Producers and Consumers) - Durable - Disk based retention - Data is replicated across brokers - Scalable - Expansions can be performed while the cluster is online - High Performance - Producers, consumers, and brokers can all be scaled to handle very large message streams - Sub-second message latency to consumers
notion image
 

Back to basics

notion image
notion image

Summary

Apache Kafka is an open source, distributed, partitioned, and replicated commit-log based publish-subscribe messaging system
  • Scalable
  • High Performance
  • Multiple Consumers
  • Multiple Producers
  • Disk-based Retention

Day 2