Apache Kafka is an open-source Java/Scala distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. As I have explained, one downside of Kafka is that setting up large Kafka clusters can be tricky. Another downside is that Kafka uses the Java virtual machine (JVM), which introduces lag because of memory garbage collection. Adding even more complexity, Kafka has until recently required Apache ZooKeeper for distributed coordination, and it requires a separate schema registry process.
Redpanda (previously called Vectorized) is a Kafka plug-in replacement written primarily in C++ using the Seastar asynchronous framework, and the Raft consensus algorithm for its distributed log. Redpanda does not require using ZooKeeper or the JVM, and its source is available on GitHub under the Business Source License (BSL). It’s not technically open source as defined by the Open Software Foundation, but that doesn’t matter to me because I have no plans to offer Redpanda as a service.
Redpanda vs. Kafka
As you might expect from the reimplementation in C++, Redpanda has significantly lower latency and higher performance than Kafka. It’s also much easier to install and tune.
Figure 1 shows latency charts for Redpanda and Kafka. The left-hand chart shows average latency versus time, and the right-hand chart shows latency versus percentile. Redpanda’s caption isn’t exactly false, but it does exaggerate. I’d rephrase it and say that Kafka’s average latency is 6 to 10 times higher than Redpanda’s, and that Kafka’s tail latency is up to 40 times higher than Redpanda’s.