Last Updated: 3/19/2026
Benchmarks
Feldera delivers exceptional performance for incremental computation. This page presents benchmark results comparing Feldera to other streaming systems.
Nexmark Benchmarks
Nexmark is a standard benchmark suite for streaming systems, simulating an online auction platform. It includes 23 queries covering various streaming patterns.
Performance Comparison
Feldera significantly outperforms Apache Flink and Beam-based systems:
16-core Streaming Performance (events/second):
| Query | Feldera | Flink | Flink on Beam | Dataflow on Beam |
|---|---|---|---|---|
| Q0 | 6.97M | 2.64M | 283K | 698K |
| Q1 | 6.60M | 2.60M | 316K | 1.02M |
| Q2 | 6.75M | 3.12M | 517K | 1.82M |
| Q3 | 6.61M | 2.04M | 555K | 794K |
| Q4 | 4.23M | 501K | 94K | 63K |
| Q5 | 6.59M | 662K | 252K | 115K |
| Q8 | 6.64M | 2.10M | 397K | 419K |
| Q12 | 6.68M | 2.00M | 366K | 176K |
| Q13 | 3.68M | 1.57M | 226K | 1.00M |
Key observations:
- Feldera is 2-10ร faster than Flink for most queries
- Feldera is 10-100ร faster than Beam-based systems
- Performance advantage increases for complex queries
Methodology
Benchmarks were run on a 16-core system with:
- 100 million events per run
- Streaming mode (continuous processing)
- Default configurations for all systems
Performance Characteristics
Throughput
Feldera achieves millions of events per second on a laptop:
- Simple queries (filters, projections): 5-7M events/sec
- Aggregations (GROUP BY, COUNT): 3-6M events/sec
- Joins: 2-4M events/sec
- Complex queries (nested joins, window functions): 1-3M events/sec
Latency
End-to-end latency (input to output):
- Typical: Sub-millisecond for simple queries
- Complex queries: Single-digit milliseconds
- Large state: Scales with state size but remains low
Memory Usage
Feldera efficiently manages memory:
- Small datasets: Fits entirely in RAM
- Large datasets: Automatically spills to disk
- Compression: Reduces memory footprint by 2-5ร
Scalability
Vertical Scaling
Performance scales with CPU cores:
| Cores | Throughput | Speedup |
|---|---|---|
| 1 | 1.2M/sec | 1.0ร |
| 4 | 4.1M/sec | 3.4ร |
| 8 | 6.8M/sec | 5.7ร |
| 16 | 9.2M/sec | 7.7ร |
Near-linear scaling up to 8-16 cores.
Dataset Size
Feldera handles datasets larger than RAM:
- In-memory: Best performance
- Spilling to NVMe: 2-3ร slower
- Object storage: 5-10ร slower
Performance degrades gracefully as dataset size increases.
Running Benchmarks
Nexmark Benchmarks
Run Nexmark benchmarks yourself:
cd benchmark
./run-nexmark.sh --runner=feldera --events=100M --cores=16Compare with other systems:
# Flink
./run-nexmark.sh --runner=flink --events=100M --cores=16
# Beam with Flink
./run-nexmark.sh --runner=beam/flink --events=100M --cores=16Custom Benchmarks
Benchmark your own queries:
import time
from feldera import Pipeline
# Create and start pipeline
pipeline = Pipeline.create(...)
pipeline.start()
# Measure throughput
start = time.time()
for i in range(1000000):
pipeline.input_json("events", {"id": i, "value": i * 2})
elapsed = time.time() - start
print(f"Throughput: {1000000 / elapsed:.0f} events/sec")Optimization Tips
Maximize Throughput
- Use more workers: Increase
workersin runtime config - Enable storage: For datasets larger than RAM
- Batch inputs: Use larger batch sizes
- Tune connectors: Adjust connector-specific settings
Minimize Latency
- Reduce batch size: Lower
min_batch_size_records - Reduce buffering delay: Lower
max_buffering_delay_usecs - Use fewer workers: Reduces coordination overhead
- Optimize queries: Simplify complex queries
Reduce Memory Usage
- Enable storage: Spill to disk
- Enable compression: Reduce memory footprint
- Filter early: Reduce data volume
- Aggregate data: Reduce cardinality
Whatโs Next
- Architecture: Understand how Feldera achieves high performance
- Pipeline Configuration: Tune performance settings
- Memory Management: Optimize memory usage