Using Apache Kafka for transferring files

A quick guide on how to use Apache Kafka as a sophisticated platform for exchanging files between applications and systems. The KafkaJS library is used on the producer and consumer side to implement a minimal working example.

Even though Apache Kafkas main purpose is not transferring files, there are quite some scenarios where it should be considered as it is absolutely superior to “traditional” file exchange mechanisms like shared filesystems, (S)FTP servers etc.

So first, let’s have a look at the pros & cons of using Apache Kafka as a file transfer solution.

Table of Contents

Pros & Cons of using Kafka for transferring files

Pros – arguments for using Apache Kafka as a file transfer platform:

1:n distribution – Using different consumer groups, a one-to-many distribution can be easily realized.
Replay option – By changing the offset value, a replay of consuming files can be achieved, e.g. if the consumer had a processing error.
Buffering & prevention of data loss – With Apache Kafka the consumer pulls data which is held by the server until it is consumed and even after that for a configurable retention period. This prevents data from being lost as it could happen in a push-approach when no consumer is available.
Scalability – By using Kafka as a message transfer platform you’ll automatically benefit from it’s outstanding scalability features.

Cons – arguments against using Apache Kafka or which should at least be considered:

Run & maintain efforts – As Apache Kafka is a tool usually operated as a cluster with multiple nodes, requiring regular updates etc. the operational efforts can be significantly higher compared to very simple file transfer methods like shared folders or a SFTP server.
File sizes – Note that the size of transferred files (or messages) has an impact of the performance and also costs (needed storage). Kafka is not primarily designed to handle large messages rather a lot of them. The initial maximum message size of Kafka is 1 MB. This can be increased but there’s a natural limit of maybe around 10 MB where the use of Kafka becomes questionable if your files to be processed are regularly bigger. You should evaluate and test this for your specific use-case.

Putting it all together, there are quite a lot of scenarios where the pros will bring significant benefits over the cons, e.g. in distributing large amounts of smaller XML or JSON files like in EDI communications.

Getting Kafka up & running

First, we’ll install Apache Kafka using the binary download. After extracting the files I recommend to create a small bash script kafka-startup.sh in the directory where the extraction was made. This script encapsulates the steps needed to start Kafka.

#!/bin/bash
kafka_base_dir=kafka_2.13-4.0.0
export KAFKA_CLUSTER_ID="$($kafka_base_dir/bin/kafka-storage.sh random-uuid)"
./$kafka_base_dir/bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c $kafka_base_dir/config/server.properties
./$kafka_base_dir/bin/kafka-server-start.sh $kafka_base_dir/config/server.properties

In this example Apache Kafka 4.0.0 is used. If you are using an older version, the --standalone option in the kafka-storage.sh call can be omitted.

Executing this script you should see an output like this indicating that Kafka has started properly.

[2025-03-28 21:55:16,127] INFO [BrokerServer id=1] Transition from STARTING to STARTED (kafka.server.BrokerServer)
[2025-03-28 21:55:16,128] INFO Kafka version: 4.0.0 (org.apache.kafka.common.utils.AppInfoParser)
[2025-03-28 21:55:16,128] INFO Kafka commitId: 985bc99521dd22bb (org.apache.kafka.common.utils.AppInfoParser)
[2025-03-28 21:55:16,129] INFO Kafka startTimeMs: 1743195316127 (org.apache.kafka.common.utils.AppInfoParser)
[2025-03-28 21:55:16,130] INFO [KafkaRaftServer nodeId=1] Kafka Server started (kafka.server.KafkaRaftServer)

If you see an exception during startup like Invalid cluster.id in: /tmp/kraft-combined-logs/meta.properties then it is very likely you had already started a Kafka instance earlier and there’s an old storage configuration present. To solve this, simply execute rm -rf /tmp/kraft-combined-logs/.

Publishing files as messages

Creating a topic

First, let’s create a topic in our Apache Kafka called filetransfer. Also we will set the maximum message size for this topic to 10 MB. To do so, navigate to the bin folder of the extracted Kafka installation and use the provided command-line scripts.

$ ./kafka-topics.sh --create --topic filetransfer --bootstrap-server localhost:9092
Created topic filetransfer.
$ ./kafka-configs.sh --bootstrap-server localhost:9092  --alter --entity-type topics \ 
  --entity-name filetransfer  --add-config max.message.bytes=10485880
Completed updating config for topic filetransfer.

Implementing a file producer

To publish files as messages, we will implement a simple client application with Node.js using the KafkaJS package. The client will take a filename as an input argument and read its entire content into a byte buffer. It is worth noting that the message payload in Kafka is a plain byte array by definition, so every type of file – whether it’s a JSON, XML, PDF, image or else – can be published. The name of the file will be set as a header value so that consumers have that information available.

import { Kafka, Partitioners } from 'kafkajs';
import fs from 'fs/promises';

const CLIENT_ID = 'kafkajs-file-producer';
const KAFKA_SERVER = 'localhost:9092';
const TOPIC = 'filetransfer';

const kafka = new Kafka({
    clientId: CLIENT_ID,
    brokers: [KAFKA_SERVER]
});

const producer = kafka.producer({ createPartitioner: Partitioners.DefaultPartitioner });

async function publish(fileName) {
    const fileBytes = await fs.readFile(fileName);
    producer.send({
        topic: TOPIC,
        messages: [
            {
                value: fileBytes,
                headers: {
                    filename: path.basename(fileName)
                }
            }
        ]
    }).then(() => {
        console.log(`File ${fileName} sent. (${fileBytes.length} Bytes)`);
        process.exit(0);
    });
}

await producer.connect();

await publish(process.argv[2]);

The complete kafkajs-file-producer project is available on GitHub and ships with some example files. After cloning and doing a npm install you can produce a test message.

$ node app.js ./samples/example.jpg
File ./samples/example.jpg sent. (2690215 Bytes)

Awesome! Now that we are able to publish a file as a message, let’s move on to the consumer side.

Consuming files from messages

For the consumer side, we implement a simple client listening to the filetransfer topic. Every message received will be written to a file in the output subfolder using the filename provided in the message header. The contents written to the file is simply 1:1 the byte array retrieved from the message payload.

import fs from 'fs/promises';
import path from 'path';

const CLIENT_ID = 'kafkajs-file-consumer';
const KAFKA_SERVER = 'localhost:9092';
const CONSUMER_GROUP = 'kafkajs-group';
const TOPIC = 'filetransfer';

const kafka = new Kafka({
    clientId: CLIENT_ID,
    brokers: [KAFKA_SERVER]
});

const consumer = kafka.consumer({ groupId: CONSUMER_GROUP });

await consumer.connect();

await consumer.subscribe({ topics: [TOPIC] });

await consumer.run({
    eachMessage: async ({ _topic, _partition, message, _heartbeat, _pause }) => {
        const fileBytes = Buffer.from(message.value);
        const fileName = message.headers['filename'].toString();
        await fs.writeFile(path.join('./output', fileName), fileBytes);
        console.log(`File ${fileName} written. (${fileBytes.length} Bytes)`);
    }
});

The kafkajs-file-consumer project is also available on GitHub. After cloning and doing a npm install you can start the file consumer.

$ node app.js
{"level":"INFO","timestamp":"2025-04-16T20:24:47.647Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"kafkajs-group"}
{"level":"INFO","timestamp":"2025-04-16T20:25:07.791Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"kafkajs-group","memberId":"kafkajs-file-consumer-fe505a61-5d2a-47ca-bbbc-07ec4df8c3d0","leaderId":"kafkajs-file-consumer-fe505a61-5d2a-47ca-bbbc-07ec4df8c3d0","isLeader":true,"memberAssignment":{"filetransfer":[0]},"groupProtocol":"RoundRobinAssigner","duration":20140}
File example.jpg written. (2690215 Bytes)

Now check out the example.jpg file written to the output subfolder. You should see something like this…

That’s it 🙂

Pros & Cons of using Kafka for transferring files

Getting Kafka up & running

Publishing files as messages

Creating a topic

Implementing a file producer

Consuming files from messages

Useful links