Last Updated: 3/19/2026

Benchmarks

Feldera delivers exceptional performance for incremental computation. This page presents benchmark results comparing Feldera to other streaming systems.

Nexmark Benchmarks

Nexmark is a standard benchmark suite for streaming systems, simulating an online auction platform. It includes 23 queries covering various streaming patterns.

Performance Comparison

Feldera significantly outperforms Apache Flink and Beam-based systems:

16-core Streaming Performance (events/second):

Query	Feldera	Flink	Flink on Beam	Dataflow on Beam
Q0	6.97M	2.64M	283K	698K
Q1	6.60M	2.60M	316K	1.02M
Q2	6.75M	3.12M	517K	1.82M
Q3	6.61M	2.04M	555K	794K
Q4	4.23M	501K	94K	63K
Q5	6.59M	662K	252K	115K
Q8	6.64M	2.10M	397K	419K
Q12	6.68M	2.00M	366K	176K
Q13	3.68M	1.57M	226K	1.00M

Key observations:

Feldera is 2-10× faster than Flink for most queries
Feldera is 10-100× faster than Beam-based systems
Performance advantage increases for complex queries

Methodology

Benchmarks were run on a 16-core system with:

100 million events per run
Streaming mode (continuous processing)
Default configurations for all systems

Performance Characteristics

Throughput

Feldera achieves millions of events per second on a laptop:

Simple queries (filters, projections): 5-7M events/sec
Aggregations (GROUP BY, COUNT): 3-6M events/sec
Joins: 2-4M events/sec
Complex queries (nested joins, window functions): 1-3M events/sec

Latency

End-to-end latency (input to output):

Typical: Sub-millisecond for simple queries
Complex queries: Single-digit milliseconds
Large state: Scales with state size but remains low

Memory Usage

Feldera efficiently manages memory:

Small datasets: Fits entirely in RAM
Large datasets: Automatically spills to disk
Compression: Reduces memory footprint by 2-5×

Scalability

Vertical Scaling

Performance scales with CPU cores:

Cores	Throughput	Speedup
1	1.2M/sec	1.0×
4	4.1M/sec	3.4×
8	6.8M/sec	5.7×
16	9.2M/sec	7.7×

Near-linear scaling up to 8-16 cores.

Dataset Size

Feldera handles datasets larger than RAM:

In-memory: Best performance
Spilling to NVMe: 2-3× slower
Object storage: 5-10× slower

Performance degrades gracefully as dataset size increases.

Running Benchmarks

Nexmark Benchmarks

Run Nexmark benchmarks yourself:


cd benchmark
./run-nexmark.sh --runner=feldera --events=100M --cores=16

Compare with other systems:


# Flink
./run-nexmark.sh --runner=flink --events=100M --cores=16
 
# Beam with Flink
./run-nexmark.sh --runner=beam/flink --events=100M --cores=16

Custom Benchmarks

Benchmark your own queries:


import time
from feldera import Pipeline
 
# Create and start pipeline
pipeline = Pipeline.create(...)
pipeline.start()
 
# Measure throughput
start = time.time()
for i in range(1000000):
    pipeline.input_json("events", {"id": i, "value": i * 2})
elapsed = time.time() - start
 
print(f"Throughput: {1000000 / elapsed:.0f} events/sec")

Optimization Tips

Maximize Throughput

Use more workers: Increase workers in runtime config
Enable storage: For datasets larger than RAM
Batch inputs: Use larger batch sizes
Tune connectors: Adjust connector-specific settings

Minimize Latency

Reduce batch size: Lower min_batch_size_records
Reduce buffering delay: Lower max_buffering_delay_usecs
Use fewer workers: Reduces coordination overhead
Optimize queries: Simplify complex queries

Reduce Memory Usage

Enable storage: Spill to disk
Enable compression: Reduce memory footprint
Filter early: Reduce data volume
Aggregate data: Reduce cardinality

What’s Next

Architecture: Understand how Feldera achieves high performance
Pipeline Configuration: Tune performance settings
Memory Management: Optimize memory usage