How to Install and Run Apache Kafka on Windows & Ubuntu with Python Integration

Apache Kafka is a powerful, distributed messaging system used to build real-time data pipelines and streaming applications. Whether you’re looking to set up Kafka on Windows or Ubuntu, integrating it with Python using the Kafka-Python library allows you to manage and process streams of records effectively. Are you wondering how to install Kafka on your machine, configure it properly, and get it running with Python? This tutorial will guide you through the entire process, step by step.

Step-by-Step Guide to Installing and Running Apache Kafka on Windows

1. Prerequisites

Before we begin, ensure that you have the following:

  • Java Development Kit (JDK) installed
  • Python installed
  • Apache Kafka binaries

2. Download and Set Up Apache Kafka

  1. Download Kafka: Visit the official Apache Kafka website to download the latest version of Kafka for your system.

  2. Create Directories: On your C drive, create two folders:

    • C:/kafka_logs/zookeeper
    • C:/kafka_logs/server_logs
  3. Update Zookeeper Configuration: Navigate to the config/zookeeper.properties file and update it as follows:

    dataDir=F:/kafka_logs/zookeeper
    maxClientCnxns=1
    

    This property limits the number of active connections from a host to a single ZooKeeper server.

  4. Update Server Configuration: Modify the config/server.properties file:

    log.dirs=F:/kafka_logs/server_logs
    zookeeper.connect=localhost:2181
    zookeeper.connection.timeout.ms=60000
    

3. Start Zookeeper and Kafka Server

  1. Start Zookeeper: Open the command prompt and run:

    C:/kafka_2.12-3.2.0/bin/windows/zookeeper-server-start.bat C:/kafka_2.12-3.2.0/config/zookeeper.properties
    
  2. Start Kafka Server: In a new command prompt, start the Kafka server:

    C:/kafka_2.12-3.2.0/bin/windows/kafka-server-start.bat C:/kafka_2.12-3.2.0/config/server.properties
    

4. Create Kafka Topics

To create a Kafka topic, run:

C:/kafka_2.12-3.2.0/bin/windows/kafka-topics.bat --create --topic hello_world --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1

5. Start Kafka Producer and Consumer

  1. Start Producer:

    C:/kafka_2.12-3.2.0/bin/windows/kafka-console-producer.bat --topic hello_world --bootstrap-server localhost:9092
    
  2. Start Consumer:

    C:/kafka_2.12-3.2.0/bin/windows/kafka-console-consumer.bat --topic hello_world --from-beginning --bootstrap-server localhost:9092
    


Setting Up Apache Kafka on Ubuntu

For Ubuntu users, the process can be automated with a shell script. Here’s how you can set up Kafka on your Ubuntu machine:

1. Create a Shell Script

Create a file named setup_kafka.sh and insert the following content:

#!/bin/bash

# Step 1: Download Apache Kafka (replace the URL with the latest version)
wget https://downloads.apache.org/kafka/3.6.0/kafka_2.13-3.6.0.tgz

# Step 2: Extract the downloaded archive
tar -xzf kafka_2.13-3.6.0.tgz

# Step 3: Rename to 'kafka'
mv kafka_2.13-3.6.0 kafka

# Step 4: Create data directories for Kafka
mkdir -p kafka/kafka_logs/zookeeper kafka/kafka_logs/server_logs

# Step 5: Configure ZooKeeper Properties
cat <<EOF > kafka/config/zookeeper.properties
dataDir=./kafka_logs/zookeeper
clientPort=2181
maxClientCnxns=1
admin.enableServer=false
EOF

# Step 6: Start ZooKeeper
kafka/bin/zookeeper-server-start.sh kafka/config/zookeeper.properties &

# Step 7: Start Kafka
kafka/bin/kafka-server-start.sh kafka/config/server.properties

2. Run the Shell Script

  1. Make the script executable:

    chmod +x setup_kafka.sh
    
  2. Run the script:

    ./setup_kafka.sh
    

Integrating Apache Kafka with Python

With Kafka up and running, you can now integrate it with Python using the Kafka-Python library.

1. Install Kafka-Python

Install the Kafka-Python library using pip:

pip install kafka-python

2. Example Python Code

Here’s a simple example to create a Kafka producer that sends messages to a Kafka topic:

from time import sleep
from json import dumps
from kafka import KafkaProducer

topic_name = 'hello_world'
producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda x: dumps(x).encode('utf-8')
)

for e in range(1000):
    data = {'number': e}
    print(data)
    producer.send(topic_name, value=data)
    sleep(5)

Conclusion

Setting up Apache Kafka on both Windows and Ubuntu might seem challenging, but with the right guidance, it becomes a manageable process. This tutorial covered everything from installing Kafka to integrating it with Python using Kafka-Python. By following these steps, you can start building real-time data pipelines and streaming applications with ease. Whether you’re a beginner or looking to enhance your skills, mastering Kafka and its integration with Python will open up new possibilities for your data-driven projects.