Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Big data messaging with Kafka, Part 1

Sunil Patil | April 26, 2016
Build a continuous big data messaging system with Kafka.

Figure 1: Kafka's architecture
Figure 1. Architecture of a Kafka message system

Kafka's architecture is very simple, which can result in better performance and throughput in some systems. Every topic in Kafka is like a simple log file. When a producer publishes a message, the Kafka server appends it to the end of the log file for its given topic. The server also assigns an offset, which is a number used to permanently identify each message. As the number of messages grows, the value of each offset increases; for example if the producer publishes three messages the first one might get an offset of 1, the second an offset of 2, and the third an offset of 3.

Kafka benchmarks

Production use by LinkedIn and other enterprises has shown that with proper configuration Kafka is capable of processing hundreds of gigabytes of data daily. In 2011, three LinkedIn engineers used benchmark testing to demonstrate that Kafka could achieve much higher throughput than ActiveMQ and RabbitMQ.

When the Kafka consumer first starts, it will send a pull request to the server, asking to retrieve any messages for a particular topic with an offset value higher than 0. The server will check the log file for that topic and return the three new messages. The consumer will process the messages, then send a request for messages with an offset higher than 3, and so on.

In Kafka, the client is responsible for remembering the offset count and retrieving messages.The Kafka server doesn't track or manage message consumption. By default, a Kafka server will keep a message for seven days. A background thread in the server checks and deletes messages that are seven days or older. A consumer can access messages as long as they are on the server. It can read a message multiple times, and even read messages in reverse order of receipt. But if the consumer fails to retrieve the message before the seven days are up, it will miss that message.

Quick setup and demo

We'll build a custom application in this tutorial, but let's start by installing and testing a Kafka instance with an out-of-the-box producer and consumer.

  1. Visit the Kafka download page to install the most recent version (0.9 as of this writing).
  2. Extract the binaries into a software/kafka folder. For the current version it's software/kafka_2.11-0.9.0.0.
  3. Change your current directory to point to the new folder.
  4. Start the Zookeeper server by executing the command: bin/zookeeper-server-start.sh config/zookeeper.properties.
  5. Start the Kafka server by executing: bin/kafka-server-start.sh config/server.properties.
  6. Create a test topic that you can use for testing: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic javaworld.
  7. Start a simple console consumer that can consume messages published to a given topic, such as javaworld: bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning.
  8. Start up a simple producer console that can publish messages to the test topic: bin/kafka-console-producer.sh --broker-list localhost:9092 --topic javaworld.
  9. Try typing one or two messages into the producer console. Your messages should show in the consumer console.

 

Previous Page  1  2  3  4  5  6  Next Page 

Sign up for CIO Asia eNewsletters.