Last Updated: 3/19/2026
Memory Management
Feldera is designed to handle datasets larger than available RAM by efficiently spilling to disk. Understanding memory management is crucial for optimizing pipeline performance and resource usage.
Storage Configuration
Enable storage to work with datasets larger than RAM:
{
"storage": {
"backend": {"name": "default"},
"min_storage_bytes": 10485760,
"compression": "snappy",
"cache_mib": 4096
},
"storage_config": {
"path": "/data/pipeline-state",
"cache": "page_cache"
}
}When storage is enabled, Feldera automatically spills data to disk when memory pressure increases.
Storage Thresholds
Control when data is written to storage:
min_storage_bytes — Minimum bytes before writing a batch to storage (default: 10 MiB):
{
"storage": {
"min_storage_bytes": 10485760
}
}min_step_storage_bytes — Minimum bytes for step batches (default: effectively disabled):
{
"storage": {
"min_step_storage_bytes": 52428800
}
}Lower thresholds increase disk I/O but reduce memory usage. Higher thresholds reduce disk I/O but increase memory usage.
Cache Configuration
Choose between page cache and Feldera’s internal cache:
Page cache (default, currently better performance):
{
"storage_config": {
"cache": "page_cache"
}
}Feldera cache (under development):
{
"storage_config": {
"cache": "feldera_cache"
},
"storage": {
"cache_mib": 4096
}
}The cache_mib setting controls the maximum in-memory cache size. If unset, each thread gets 256 MiB.
Compression
Enable compression to reduce storage space and I/O:
{
"storage": {
"compression": "snappy"
}
}Compression options:
default— Use Feldera’s default algorithmnone— No compressionsnappy— Fast compression with moderate ratio
Compression trades CPU for reduced I/O and storage space.
Memory Limits
Set memory limits in the resources configuration:
{
"resources": {
"memory_mb_min": 2048,
"memory_mb_max": 16384
}
}These limits are enforced only in Feldera Cloud. In self-hosted deployments, they serve as documentation.
Worker Threads and Memory
Each worker thread increases memory consumption:
{
"workers": 8
}More workers = higher memory usage but better parallelism. The typical sweet spot is 4–16 workers.
Monitoring Memory Usage
Check memory usage via metrics:
stats = pipeline.stats()
# Memory metrics available in global_metricsGenerate a support bundle with heap profile:
bundle = pipeline.support_bundle(heap_profile=True)Optimizing Memory Usage
1. Enable Storage
Always enable storage for large datasets:
{
"storage": {
"backend": {"name": "default"}
}
}2. Use Fast Storage
Use NVMe SSDs for the storage path to minimize spilling overhead.
3. Filter Early
Reduce data volume early in the pipeline:
CREATE VIEW filtered_events AS
SELECT * FROM events
WHERE event_time > NOW() - INTERVAL 7 DAY;4. Aggregate Data
Reduce cardinality with aggregations:
CREATE MATERIALIZED VIEW hourly_stats AS
SELECT
DATE_TRUNC('hour', event_time) as hour,
COUNT(*) as count
FROM events
GROUP BY hour;5. Limit Materialized Views
Only materialize views that need to be queried or output:
-- Don't materialize intermediate views
CREATE VIEW intermediate AS ...;
-- Only materialize final results
CREATE MATERIALIZED VIEW final_results AS
SELECT * FROM intermediate;6. Adjust Worker Count
Reduce workers if memory is constrained:
{
"workers": 4
}7. Tune Storage Thresholds
Lower thresholds to spill more aggressively:
{
"storage": {
"min_storage_bytes": 5242880
}
}Storage Backends
Local File System
Default backend using local disk:
{
"storage": {
"backend": {"name": "default"}
}
}Object Storage
Use S3, GCS, or Azure Blob Storage:
{
"storage": {
"backend": {
"name": "object",
"config": {
"url": "s3://my-bucket/pipeline-state",
"AWS_ACCESS_KEY_ID": "...",
"AWS_SECRET_ACCESS_KEY": "..."
}
}
}
}Object storage is slower than local disk but provides durability and scalability.
Troubleshooting
Out of Memory Errors
If the pipeline runs out of memory:
- Enable storage if not already enabled
- Reduce
min_storage_bytesto spill more aggressively - Reduce the number of workers
- Increase available RAM
- Filter data earlier in the pipeline
Slow Performance
If spilling to disk causes slow performance:
- Use faster storage (NVMe SSDs)
- Increase
min_storage_bytesto reduce spilling - Enable compression to reduce I/O
- Increase cache size with
cache_mib - Add more RAM to reduce spilling
What’s Next
- Pipeline Configuration: Learn about all configuration options for Feldera pipelines.
- Metrics Monitoring: Monitor memory usage and other pipeline metrics.
- Fault Tolerance: Understand how checkpoint storage interacts with memory management.