Internal site. Jolli authentication required to view.
Skip to Content
⚙️ OperationsMemory Management

Last Updated: 3/19/2026


Memory Management

Feldera is designed to handle datasets larger than available RAM by efficiently spilling to disk. Understanding memory management is crucial for optimizing pipeline performance and resource usage.

Storage Configuration

Enable storage to work with datasets larger than RAM:

{ "storage": { "backend": {"name": "default"}, "min_storage_bytes": 10485760, "compression": "snappy", "cache_mib": 4096 }, "storage_config": { "path": "/data/pipeline-state", "cache": "page_cache" } }

When storage is enabled, Feldera automatically spills data to disk when memory pressure increases.

Storage Thresholds

Control when data is written to storage:

min_storage_bytes — Minimum bytes before writing a batch to storage (default: 10 MiB):

{ "storage": { "min_storage_bytes": 10485760 } }

min_step_storage_bytes — Minimum bytes for step batches (default: effectively disabled):

{ "storage": { "min_step_storage_bytes": 52428800 } }

Lower thresholds increase disk I/O but reduce memory usage. Higher thresholds reduce disk I/O but increase memory usage.

Cache Configuration

Choose between page cache and Feldera’s internal cache:

Page cache (default, currently better performance):

{ "storage_config": { "cache": "page_cache" } }

Feldera cache (under development):

{ "storage_config": { "cache": "feldera_cache" }, "storage": { "cache_mib": 4096 } }

The cache_mib setting controls the maximum in-memory cache size. If unset, each thread gets 256 MiB.

Compression

Enable compression to reduce storage space and I/O:

{ "storage": { "compression": "snappy" } }

Compression options:

  • default — Use Feldera’s default algorithm
  • none — No compression
  • snappy — Fast compression with moderate ratio

Compression trades CPU for reduced I/O and storage space.

Memory Limits

Set memory limits in the resources configuration:

{ "resources": { "memory_mb_min": 2048, "memory_mb_max": 16384 } }

These limits are enforced only in Feldera Cloud. In self-hosted deployments, they serve as documentation.

Worker Threads and Memory

Each worker thread increases memory consumption:

{ "workers": 8 }

More workers = higher memory usage but better parallelism. The typical sweet spot is 4–16 workers.

Monitoring Memory Usage

Check memory usage via metrics:

stats = pipeline.stats() # Memory metrics available in global_metrics

Generate a support bundle with heap profile:

bundle = pipeline.support_bundle(heap_profile=True)

Optimizing Memory Usage

1. Enable Storage

Always enable storage for large datasets:

{ "storage": { "backend": {"name": "default"} } }

2. Use Fast Storage

Use NVMe SSDs for the storage path to minimize spilling overhead.

3. Filter Early

Reduce data volume early in the pipeline:

CREATE VIEW filtered_events AS SELECT * FROM events WHERE event_time > NOW() - INTERVAL 7 DAY;

4. Aggregate Data

Reduce cardinality with aggregations:

CREATE MATERIALIZED VIEW hourly_stats AS SELECT DATE_TRUNC('hour', event_time) as hour, COUNT(*) as count FROM events GROUP BY hour;

5. Limit Materialized Views

Only materialize views that need to be queried or output:

-- Don't materialize intermediate views CREATE VIEW intermediate AS ...; -- Only materialize final results CREATE MATERIALIZED VIEW final_results AS SELECT * FROM intermediate;

6. Adjust Worker Count

Reduce workers if memory is constrained:

{ "workers": 4 }

7. Tune Storage Thresholds

Lower thresholds to spill more aggressively:

{ "storage": { "min_storage_bytes": 5242880 } }

Storage Backends

Local File System

Default backend using local disk:

{ "storage": { "backend": {"name": "default"} } }

Object Storage

Use S3, GCS, or Azure Blob Storage:

{ "storage": { "backend": { "name": "object", "config": { "url": "s3://my-bucket/pipeline-state", "AWS_ACCESS_KEY_ID": "...", "AWS_SECRET_ACCESS_KEY": "..." } } } }

Object storage is slower than local disk but provides durability and scalability.

Troubleshooting

Out of Memory Errors

If the pipeline runs out of memory:

  1. Enable storage if not already enabled
  2. Reduce min_storage_bytes to spill more aggressively
  3. Reduce the number of workers
  4. Increase available RAM
  5. Filter data earlier in the pipeline

Slow Performance

If spilling to disk causes slow performance:

  1. Use faster storage (NVMe SSDs)
  2. Increase min_storage_bytes to reduce spilling
  3. Enable compression to reduce I/O
  4. Increase cache size with cache_mib
  5. Add more RAM to reduce spilling

What’s Next