Internal site. Jolli authentication required to view.
Skip to Content

Last Updated: 3/19/2026


Ad-hoc SQL Queries

Feldera allows you to run ad-hoc SQL SELECT queries against materialized views and tables in a running or paused pipeline. These queries are evaluated using Apache DataFusion in batch mode, providing a consistent view of the pipeline state.

Running Queries

Query as JSON

Execute a query and retrieve results as Python dictionaries:

for row in pipeline.query("SELECT * FROM user_stats ORDER BY count DESC LIMIT 10"): print(row)

The query returns a generator that yields rows as dictionaries. All floating-point numbers are deserialized as Decimal objects to avoid precision loss.

Query as Parquet

Save query results to a Parquet file:

pipeline.query_parquet( "SELECT * FROM user_stats WHERE count > 100", path="results.parquet" )

If the path doesnโ€™t have a .parquet extension, it will be automatically appended.

Query as Tabular Text

Get query results as a formatted table:

for line in pipeline.query_tabular("SELECT * FROM user_stats LIMIT 5"): print(line)

This returns a generator yielding lines of a human-readable table.

Query Hash

Get a hash of the result set for quick comparison:

hash_value = pipeline.query_hash("SELECT * FROM user_stats ORDER BY id")

For a stable hash, the query must be deterministic (sorted).

Executing Non-SELECT Queries

Use the execute() method for INSERT, DELETE, or other non-SELECT statements:

pipeline.execute("INSERT INTO users VALUES (1, 'Alice', 30)") pipeline.execute("DELETE FROM users WHERE id = 1")

This method processes the query eagerly and discards results. If the pipeline is paused, INSERT/DELETE operations will block until the pipeline resumes.

Requirements

Materialized Views Only

You can only query materialized tables and views. Non-materialized views cannot be queried:

-- Can be queried CREATE TABLE users (id INT, name VARCHAR); CREATE MATERIALIZED VIEW user_stats AS SELECT COUNT(*) FROM users; -- Cannot be queried CREATE VIEW temp_view AS SELECT * FROM users WHERE id > 10;

Pipeline State

The pipeline must be in RUNNING or PAUSED state to execute ad-hoc queries. Queries against stopped pipelines will fail.

Query Consistency

Ad-hoc queries provide a consistent snapshot of the pipeline state at the time the query executes. The results are consistent with what the incremental engine would produce for the same query, aside from minor dialect and rounding differences.

Performance Considerations

Ad-hoc queries are evaluated in batch mode using DataFusion, not incrementally. For large datasets:

  • Use WHERE clauses to filter data
  • Add ORDER BY and LIMIT to reduce result size
  • Consider creating additional materialized views for frequently-queried patterns

Examples

Debugging Pipeline State

Check the current state of a table:

for row in pipeline.query("SELECT COUNT(*) as total FROM events"): print(f"Total events: {row['total']}")

Inspecting Aggregations

Verify aggregation results:

query = """ SELECT user_id, COUNT(*) as event_count, MAX(event_time) as last_event FROM events GROUP BY user_id ORDER BY event_count DESC LIMIT 10 """ for row in pipeline.query(query): print(f"User {row['user_id']}: {row['event_count']} events")

Comparing Results

Use query hashes to detect changes:

hash1 = pipeline.query_hash("SELECT * FROM results ORDER BY id") # ... make changes ... hash2 = pipeline.query_hash("SELECT * FROM results ORDER BY id") if hash1 != hash2: print("Results changed")

Exporting Data

Export a subset of data to Parquet:

pipeline.query_parquet( "SELECT * FROM user_stats WHERE created_at >= '2024-01-01'", path="stats_2024.parquet" )

Whatโ€™s Next