Last Updated: 3/19/2026

Pipeline Configuration

Feldera pipelines expose a comprehensive set of runtime configuration options that control how your pipeline executes, manages resources, and handles failures. These settings are specified in the runtime_config section when creating or modifying a pipeline.

Core Configuration Options

Worker Threads

The workers parameter controls the number of DBSP worker threads allocated to the pipeline. Each worker thread is paired with a background thread for LSM merging, effectively doubling the total thread count.


{
  "workers": 8
}

The typical sweet spot is between 4 and 16 workers. Each worker increases memory consumption for data structures used during pipeline steps. The default is 8 workers.

Storage Configuration

Storage determines whether pipeline state is kept in memory or persisted to disk. When storage is set, the pipeline can work with datasets larger than available RAM and supports checkpointing for fault tolerance.


{
  "storage": {
    "backend": {
      "name": "default"
    },
    "min_storage_bytes": 10485760,
    "compression": "default",
    "cache_mib": 1024
  }
}

Storage backend options:

default — Uses the local file system (current default)
file — Explicitly uses local file system with optional async I/O configuration
object — Uses object storage (S3, GCS, Azure Blob Storage)

Storage parameters:

min_storage_bytes — Minimum estimated bytes before writing a batch to storage (default: 10 MiB)
min_step_storage_bytes — Minimum bytes for step batches (default: effectively disabled)
compression — Compression algorithm: default, none, or snappy
cache_mib — Maximum in-memory cache size in MiB (if unset, each thread gets 256 MiB)

When storage is disabled (set to null), all state is kept in memory, which is faster but limits dataset size to available RAM.

Storage Cache Configuration

The cache field within storage_config controls how storage access is cached:


{
  "storage_config": {
    "path": "/data/pipeline-state",
    "cache": "page_cache"
  }
}

page_cache — Uses the operating system’s page cache (default, currently better performance)
feldera_cache — Uses Feldera’s internal cache implementation (under development)

Fault Tolerance

Fault tolerance enables pipelines to recover from crashes without data loss. Configure it using the fault_tolerance field:


{
  "fault_tolerance": {
    "model": "exactly_once",
    "checkpoint_interval_secs": 60
  }
}

Fault tolerance models:

exactly_once — Each record is output exactly once (default when fault tolerance is enabled)
at_least_once — Each record is output at least once; crashes may duplicate output
none — Disables fault tolerance (set model to "none" or omit the fault_tolerance field entirely)

Checkpoint interval:

The checkpoint_interval_secs parameter controls how often automatic checkpoints are created (default: 60 seconds, range: 1–3600). Set to null to disable periodic checkpointing.

Fault tolerance requires storage to be enabled and uses the configured storage backend to persist checkpoints and logs.

Memory and Resource Limits

Memory Configuration

Control memory usage with the resources field:


{
  "resources": {
    "cpu_cores_min": 2.0,
    "cpu_cores_max": 8.0,
    "memory_mb_min": 2048,
    "memory_mb_max": 16384,
    "storage_mb_max": 102400,
    "storage_class": "fast-nvme"
  }
}

These limits are enforced only in Feldera Cloud deployments. In self-hosted environments, they serve as documentation of expected resource usage.

Batch Size and Buffering

Control how input data is batched and buffered:


{
  "min_batch_size_records": 1000,
  "max_buffering_delay_usecs": 100000
}

min_batch_size_records — Minimum records to buffer before processing (default: 0)
max_buffering_delay_usecs — Maximum delay in microseconds to wait for the minimum batch size (default: 0)

The controller delays pushing input records to the circuit until either the minimum batch size is reached or the maximum buffering delay has elapsed.

Advanced Configuration

CPU Pinning

Pin worker threads to specific CPU cores for better performance and consistency:


{
  "pin_cpus": [0, 1, 2, 3, 4, 5, 6, 7]
}

Specify at least twice as many CPU numbers as workers. CPU pinning works best when different pipelines on the same machine are pinned to different CPUs.

Clock Resolution

For queries using the NOW() function, control how often the clock is updated:


{
  "clock_resolution_usecs": 1000000
}

The pipeline updates the clock value and triggers recomputation at most every clock_resolution_usecs microseconds (default: 1 second). This setting is ignored if the query doesn’t use NOW().

Connector Initialization

Control how many connectors are initialized in parallel during startup:


{
  "max_parallel_connector_init": 10
}

At startup, the pipeline initializes all input and output connectors. This setting controls the maximum number of connectors initialized concurrently (default: 10).

Thread Pools

Configure the number of threads for HTTP and I/O operations:


{
  "http_workers": 8,
  "io_workers": 8
}

http_workers — Runtime threads for the HTTP server (default: same as workers)
io_workers — Runtime threads for async I/O tasks (default: same as workers)

These settings rarely need adjustment but can help if ingress, egress, or ad-hoc queries become bottlenecks.

Profiling and Tracing

Enable profiling and distributed tracing:


{
  "cpu_profiler": true,
  "tracing": false,
  "tracing_endpoint_jaeger": "127.0.0.1:6831"
}

cpu_profiler — Enable CPU profiler (default: true)
tracing — Enable pipeline tracing (default: false)
tracing_endpoint_jaeger — Jaeger endpoint for trace data

Logging

Control log filtering with the logging field:


{
  "logging": "info,feldera=debug"
}

This accepts tracing-subscriber filter syntax . If unset or invalid, messages at “info” severity and higher are logged.

Environment Variables

Inject custom environment variables into the pipeline process:


{
  "env": {
    "MY_CUSTOM_VAR": "value"
  }
}

Reserved variable namespaces (FELDERA_, KUBERNETES_, TOKIO_, RUST_LOG) cannot be overridden.

Configuration Example

Here’s a complete example of a production-ready pipeline configuration:


{
  "workers": 16,
  "storage": {
    "backend": {
      "name": "default"
    },
    "compression": "snappy",
    "cache_mib": 4096
  },
  "storage_config": {
    "path": "/data/pipeline-state",
    "cache": "page_cache"
  },
  "fault_tolerance": {
    "model": "exactly_once",
    "checkpoint_interval_secs": 120
  },
  "resources": {
    "cpu_cores_min": 8.0,
    "cpu_cores_max": 16.0,
    "memory_mb_min": 8192,
    "memory_mb_max": 32768,
    "storage_mb_max": 204800
  },
  "min_batch_size_records": 10000,
  "max_buffering_delay_usecs": 500000,
  "cpu_profiler": true,
  "logging": "info,feldera=debug"
}

This configuration allocates 16 workers, enables storage with Snappy compression, configures exactly-once fault tolerance with 2-minute checkpoint intervals, and sets resource limits appropriate for a medium-scale deployment.

What’s Next

Pipeline Lifecycle: Learn about pipeline states and transitions
Fault Tolerance: Deep dive into checkpoint and recovery mechanisms
Memory Management: Understand how Feldera manages memory and spills to disk