A quick guide on how to use Apache Kafka as a sophisticated platform for exchanging files between applications and systems. The KafkaJS library is used on the producer and consumer side to implement a minimal working example.
Even though Apache Kafkas main purpose is not transferring files, there are quite some scenarios where it should be considered as it is absolutely superior to “traditional” file exchange mechanisms like shared filesystems, (S)FTP servers etc.
So first, let’s have a look at the pros & cons of using Apache Kafka as a file transfer solution.
Table of Contents
Pros & Cons of using Kafka for transferring files
Pros – arguments for using Apache Kafka as a file transfer platform:
Cons – arguments against using Apache Kafka or which should at least be considered:
Putting it all together, there are quite a lot of scenarios where the pros will bring significant benefits over the cons, e.g. in distributing large amounts of smaller XML or JSON files like in EDI communications.
Getting Kafka up & running
First, we’ll install Apache Kafka using the binary download. After extracting the files I recommend to create a small bash script kafka-startup.sh
in the directory where the extraction was made. This script encapsulates the steps needed to start Kafka.
#!/bin/bash
kafka_base_dir=kafka_2.13-4.0.0
export KAFKA_CLUSTER_ID="$($kafka_base_dir/bin/kafka-storage.sh random-uuid)"
./$kafka_base_dir/bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c $kafka_base_dir/config/server.properties
./$kafka_base_dir/bin/kafka-server-start.sh $kafka_base_dir/config/server.properties
In this example Apache Kafka 4.0.0 is used. If you are using an older version, the --standalone
option in the kafka-storage.sh call can be omitted.
Executing this script you should see an output like this indicating that Kafka has started properly.
[2025-03-28 21:55:16,127] INFO [BrokerServer id=1] Transition from STARTING to STARTED (kafka.server.BrokerServer)
[2025-03-28 21:55:16,128] INFO Kafka version: 4.0.0 (org.apache.kafka.common.utils.AppInfoParser)
[2025-03-28 21:55:16,128] INFO Kafka commitId: 985bc99521dd22bb (org.apache.kafka.common.utils.AppInfoParser)
[2025-03-28 21:55:16,129] INFO Kafka startTimeMs: 1743195316127 (org.apache.kafka.common.utils.AppInfoParser)
[2025-03-28 21:55:16,130] INFO [KafkaRaftServer nodeId=1] Kafka Server started (kafka.server.KafkaRaftServer)
If you see an exception during startup like Invalid cluster.id in: /tmp/kraft-combined-logs/meta.properties then it is very likely you had already started a Kafka instance earlier and there’s an old storage configuration present. To solve this, simply execute rm -rf /tmp/kraft-combined-logs/
.
Publishing files as messages
Creating a topic
First, let’s create a topic in our Apache Kafka called filetransfer
. Also we will set the maximum message size for this topic to 10 MB. To do so, navigate to the bin
folder of the extracted Kafka installation and use the provided command-line scripts.
$ ./kafka-topics.sh --create --topic filetransfer --bootstrap-server localhost:9092
Created topic filetransfer.
$ ./kafka-configs.sh --bootstrap-server localhost:9092 --alter --entity-type topics \
--entity-name filetransfer --add-config max.message.bytes=10485880
Completed updating config for topic filetransfer.
Implementing a file producer
To publish files as messages, we will implement a simple client application with Node.js using the KafkaJS package. The client will take a filename as an input argument and read its entire content into a byte buffer. It is worth noting that the message payload in Kafka is a plain byte array by definition, so every type of file – whether it’s a JSON, XML, PDF, image or else – can be published. The name of the file will be set as a header value so that consumers have that information available.
import { Kafka, Partitioners } from 'kafkajs';
import fs from 'fs/promises';
const CLIENT_ID = 'kafkajs-file-producer';
const KAFKA_SERVER = 'localhost:9092';
const TOPIC = 'filetransfer';
const kafka = new Kafka({
clientId: CLIENT_ID,
brokers: [KAFKA_SERVER]
});
const producer = kafka.producer({ createPartitioner: Partitioners.DefaultPartitioner });
async function publish(fileName) {
const fileBytes = await fs.readFile(fileName);
producer.send({
topic: TOPIC,
messages: [
{
value: fileBytes,
headers: {
filename: path.basename(fileName)
}
}
]
}).then(() => {
console.log(`File ${fileName} sent. (${fileBytes.length} Bytes)`);
process.exit(0);
});
}
await producer.connect();
await publish(process.argv[2]);
The complete kafkajs-file-producer project is available on GitHub and ships with some example files. After cloning and doing a npm install
you can produce a test message.
$ node app.js ./samples/example.jpg
File ./samples/example.jpg sent. (2690215 Bytes)
Awesome! Now that we are able to publish a file as a message, let’s move on to the consumer side.
Consuming files from messages
For the consumer side, we implement a simple client listening to the filetransfer
topic. Every message received will be written to a file in the output
subfolder using the filename provided in the message header. The contents written to the file is simply 1:1 the byte array retrieved from the message payload.
import fs from 'fs/promises';
import path from 'path';
const CLIENT_ID = 'kafkajs-file-consumer';
const KAFKA_SERVER = 'localhost:9092';
const CONSUMER_GROUP = 'kafkajs-group';
const TOPIC = 'filetransfer';
const kafka = new Kafka({
clientId: CLIENT_ID,
brokers: [KAFKA_SERVER]
});
const consumer = kafka.consumer({ groupId: CONSUMER_GROUP });
await consumer.connect();
await consumer.subscribe({ topics: [TOPIC] });
await consumer.run({
eachMessage: async ({ _topic, _partition, message, _heartbeat, _pause }) => {
const fileBytes = Buffer.from(message.value);
const fileName = message.headers['filename'].toString();
await fs.writeFile(path.join('./output', fileName), fileBytes);
console.log(`File ${fileName} written. (${fileBytes.length} Bytes)`);
}
});
The kafkajs-file-consumer project is also available on GitHub. After cloning and doing a npm install
you can start the file consumer.
$ node app.js
{"level":"INFO","timestamp":"2025-04-16T20:24:47.647Z","logger":"kafkajs","message":"[Consumer] Starting","groupId":"kafkajs-group"}
{"level":"INFO","timestamp":"2025-04-16T20:25:07.791Z","logger":"kafkajs","message":"[ConsumerGroup] Consumer has joined the group","groupId":"kafkajs-group","memberId":"kafkajs-file-consumer-fe505a61-5d2a-47ca-bbbc-07ec4df8c3d0","leaderId":"kafkajs-file-consumer-fe505a61-5d2a-47ca-bbbc-07ec4df8c3d0","isLeader":true,"memberAssignment":{"filetransfer":[0]},"groupProtocol":"RoundRobinAssigner","duration":20140}
File example.jpg written. (2690215 Bytes)
Now check out the example.jpg
file written to the output subfolder. You should see something like this…

That’s it 🙂