Last Updated: 3/19/2026

Memory Management

Feldera is designed to handle datasets larger than available RAM by efficiently spilling to disk. Understanding memory management is crucial for optimizing pipeline performance and resource usage.

Storage Configuration

Enable storage to work with datasets larger than RAM:


{
  "storage": {
    "backend": {"name": "default"},
    "min_storage_bytes": 10485760,
    "compression": "snappy",
    "cache_mib": 4096
  },
  "storage_config": {
    "path": "/data/pipeline-state",
    "cache": "page_cache"
  }
}

When storage is enabled, Feldera automatically spills data to disk when memory pressure increases.

Storage Thresholds

Control when data is written to storage:

min_storage_bytes — Minimum bytes before writing a batch to storage (default: 10 MiB):


{
  "storage": {
    "min_storage_bytes": 10485760
  }
}

min_step_storage_bytes — Minimum bytes for step batches (default: effectively disabled):


{
  "storage": {
    "min_step_storage_bytes": 52428800
  }
}

Lower thresholds increase disk I/O but reduce memory usage. Higher thresholds reduce disk I/O but increase memory usage.

Cache Configuration

Choose between page cache and Feldera’s internal cache:

Page cache (default, currently better performance):


{
  "storage_config": {
    "cache": "page_cache"
  }
}

Feldera cache (under development):


{
  "storage_config": {
    "cache": "feldera_cache"
  },
  "storage": {
    "cache_mib": 4096
  }
}

The cache_mib setting controls the maximum in-memory cache size. If unset, each thread gets 256 MiB.

Compression

Enable compression to reduce storage space and I/O:


{
  "storage": {
    "compression": "snappy"
  }
}

Compression options:

default — Use Feldera’s default algorithm
none — No compression
snappy — Fast compression with moderate ratio

Compression trades CPU for reduced I/O and storage space.

Memory Limits

Set memory limits in the resources configuration:


{
  "resources": {
    "memory_mb_min": 2048,
    "memory_mb_max": 16384
  }
}

These limits are enforced only in Feldera Cloud. In self-hosted deployments, they serve as documentation.

Worker Threads and Memory

Each worker thread increases memory consumption:


{
  "workers": 8
}

More workers = higher memory usage but better parallelism. The typical sweet spot is 4–16 workers.

Monitoring Memory Usage

Check memory usage via metrics:


stats = pipeline.stats()
# Memory metrics available in global_metrics

Generate a support bundle with heap profile:


bundle = pipeline.support_bundle(heap_profile=True)

Optimizing Memory Usage

1. Enable Storage

Always enable storage for large datasets:


{
  "storage": {
    "backend": {"name": "default"}
  }
}

2. Use Fast Storage

Use NVMe SSDs for the storage path to minimize spilling overhead.

3. Filter Early

Reduce data volume early in the pipeline:


CREATE VIEW filtered_events AS
SELECT * FROM events
WHERE event_time > NOW() - INTERVAL 7 DAY;

4. Aggregate Data

Reduce cardinality with aggregations:


CREATE MATERIALIZED VIEW hourly_stats AS
SELECT 
    DATE_TRUNC('hour', event_time) as hour,
    COUNT(*) as count
FROM events
GROUP BY hour;

5. Limit Materialized Views

Only materialize views that need to be queried or output:


-- Don't materialize intermediate views
CREATE VIEW intermediate AS ...;
 
-- Only materialize final results
CREATE MATERIALIZED VIEW final_results AS
SELECT * FROM intermediate;

6. Adjust Worker Count

Reduce workers if memory is constrained:


{
  "workers": 4
}

7. Tune Storage Thresholds

Lower thresholds to spill more aggressively:


{
  "storage": {
    "min_storage_bytes": 5242880
  }
}

Storage Backends

Local File System

Default backend using local disk:


{
  "storage": {
    "backend": {"name": "default"}
  }
}

Object Storage

Use S3, GCS, or Azure Blob Storage:


{
  "storage": {
    "backend": {
      "name": "object",
      "config": {
        "url": "s3://my-bucket/pipeline-state",
        "AWS_ACCESS_KEY_ID": "...",
        "AWS_SECRET_ACCESS_KEY": "..."
      }
    }
  }
}

Object storage is slower than local disk but provides durability and scalability.

Troubleshooting

Out of Memory Errors

If the pipeline runs out of memory:

Enable storage if not already enabled
Reduce min_storage_bytes to spill more aggressively
Reduce the number of workers
Increase available RAM
Filter data earlier in the pipeline

Slow Performance

If spilling to disk causes slow performance:

Use faster storage (NVMe SSDs)
Increase min_storage_bytes to reduce spilling
Enable compression to reduce I/O
Increase cache size with cache_mib
Add more RAM to reduce spilling

What’s Next

Pipeline Configuration: Learn about all configuration options for Feldera pipelines.
Metrics Monitoring: Monitor memory usage and other pipeline metrics.
Fault Tolerance: Understand how checkpoint storage interacts with memory management.