Sunday 25 October 2015

Apache Kafka - Quick Start on Windows

In this post I will be demonstrating how to setup and use Apache Kafka on windows environment. Before that, I will briefly describe about Kafka and then take you in practical world. You can also refer following video to setup apache kafka in windows environment.



About Apache Kafka
Kafka is a distributed publish-subscribe messaging solution. It is fast, scalable, and durable as compared to traditional messaging systems. If you think about traditional publish-subscribe messaging system, producers produce/write messages to topic and on another side consumers consume/read messages from this topic. Kafka is designed in a manner where topics can be partitioned and replicated across multiple servers.  
If you would like to get more detail please refer this link


I referred this blog (source: http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/). It provides good and easy explanation about Kafka and its concepts.  Below two paragraphs are taken from same blog.

"Messages are simply byte arrays and the developers can use them to store any object in any format – with String, JSON, and Avro the most common. It is possible to attach a key to each message, in which case the producer guarantees that all messages with the same key will arrive to the same partition. When consuming from a topic, it is possible to configure a consumer group with multiple consumers. Each consumer in a consumer group will read messages from a unique subset of partitions in each topic they subscribe to, so each message is delivered to one consumer in the group, and all messages with the same key arrive at the same consumer."

"What makes Kafka unique is that Kafka treats each topic partition as a log (an ordered set of messages). Each message in a partition is assigned a unique offset. Kafka does not attempt to track which messages were read by each consumer and only retain unread messages; rather, Kafka retains all messages for a set amount of time, and consumers are responsible to track their location in each log. Consequently, Kafka can support a large number of consumers and retain large amounts of data with very little overhead."

Now you must have a question i.e. "How to setup Apache in windows environment?". Don't worry, I am stepping ahead to describe same. You just need to follow simple steps and you are done.


Download and Change Required Properties
1.   Download Kafka from here (kafka_2.9.1-0.8.2.2) and unzip at your desired location. 
2.   Go to <kafka_dir>\config\server.properties file and change log file location property 'log.dirs' as per your environment.
log.dirs=<kafka_dir>\kafka-logs
3.   Go to <kafka_dir>\config\zookeeper.properties file and change data directory location property 'dataDir' as per your environment
dataDir=<kafka_dir>\zookeeper-data

Start Zookeeper and Kafka Servers
Kafka internally uses Zookeeper. If you would like to get more detail on Zookeeper you can refer this link.

First you need to start Zookeeper server. To start it, execute below command:

<kafka_dir>\bin\windows\zookeeper-server-start.bat ..\..\config\zookeeper.properties

Now open another command prompt and start Kafka server:

<kafka_dir>\bin\windows\kafka-server-start.bat ..\..\config\server.properties

Create Topic
Now you need to create topic to publish and subscribe messages. To create topic you just need to execute below command. As per below command you will be creating topic 'mytopic' with single partition.

<kafka_dir>\bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic mytopic

After executing above command you must see below message on command prompt:
Created topic "mytopic".

Produce and Consume Messages
Open a command prompt and execute below command. This command prompt will be treated as 
producer.

<kafka_dir>\bin\windows\kafka-console-producer.bat --broker-list localhost:9092 --topic mytopic

Now open another command prompt and execute below command. This command prompt will be treated as consumer.

<kafka_dir>\bin\windows\kafka-console-consumer.bat --zookeeper localhost:2181 --topic mytopic
If you will type any message in the producer command prompt and press enter, it will be consumed by consumer and you must be able to see the same message in consumer command prompt.


 If you reached at this stage, that means you are able to setup Kafka successfully in your windows environment. If you need more detail with respect to above post you can post comment on it. Thanks for reading this.

4 comments: