Last Updated: 3/19/2026
Ad-hoc SQL Queries
Feldera allows you to run ad-hoc SQL SELECT queries against materialized views and tables in a running or paused pipeline. These queries are evaluated using Apache DataFusion in batch mode, providing a consistent view of the pipeline state.
Running Queries
Query as JSON
Execute a query and retrieve results as Python dictionaries:
for row in pipeline.query("SELECT * FROM user_stats ORDER BY count DESC LIMIT 10"):
print(row)The query returns a generator that yields rows as dictionaries. All floating-point numbers are deserialized as Decimal objects to avoid precision loss.
Query as Parquet
Save query results to a Parquet file:
pipeline.query_parquet(
"SELECT * FROM user_stats WHERE count > 100",
path="results.parquet"
)If the path doesnโt have a .parquet extension, it will be automatically appended.
Query as Tabular Text
Get query results as a formatted table:
for line in pipeline.query_tabular("SELECT * FROM user_stats LIMIT 5"):
print(line)This returns a generator yielding lines of a human-readable table.
Query Hash
Get a hash of the result set for quick comparison:
hash_value = pipeline.query_hash("SELECT * FROM user_stats ORDER BY id")For a stable hash, the query must be deterministic (sorted).
Executing Non-SELECT Queries
Use the execute() method for INSERT, DELETE, or other non-SELECT statements:
pipeline.execute("INSERT INTO users VALUES (1, 'Alice', 30)")
pipeline.execute("DELETE FROM users WHERE id = 1")This method processes the query eagerly and discards results. If the pipeline is paused, INSERT/DELETE operations will block until the pipeline resumes.
Requirements
Materialized Views Only
You can only query materialized tables and views. Non-materialized views cannot be queried:
-- Can be queried
CREATE TABLE users (id INT, name VARCHAR);
CREATE MATERIALIZED VIEW user_stats AS SELECT COUNT(*) FROM users;
-- Cannot be queried
CREATE VIEW temp_view AS SELECT * FROM users WHERE id > 10;Pipeline State
The pipeline must be in RUNNING or PAUSED state to execute ad-hoc queries. Queries against stopped pipelines will fail.
Query Consistency
Ad-hoc queries provide a consistent snapshot of the pipeline state at the time the query executes. The results are consistent with what the incremental engine would produce for the same query, aside from minor dialect and rounding differences.
Performance Considerations
Ad-hoc queries are evaluated in batch mode using DataFusion, not incrementally. For large datasets:
- Use WHERE clauses to filter data
- Add ORDER BY and LIMIT to reduce result size
- Consider creating additional materialized views for frequently-queried patterns
Examples
Debugging Pipeline State
Check the current state of a table:
for row in pipeline.query("SELECT COUNT(*) as total FROM events"):
print(f"Total events: {row['total']}")Inspecting Aggregations
Verify aggregation results:
query = """
SELECT
user_id,
COUNT(*) as event_count,
MAX(event_time) as last_event
FROM events
GROUP BY user_id
ORDER BY event_count DESC
LIMIT 10
"""
for row in pipeline.query(query):
print(f"User {row['user_id']}: {row['event_count']} events")Comparing Results
Use query hashes to detect changes:
hash1 = pipeline.query_hash("SELECT * FROM results ORDER BY id")
# ... make changes ...
hash2 = pipeline.query_hash("SELECT * FROM results ORDER BY id")
if hash1 != hash2:
print("Results changed")Exporting Data
Export a subset of data to Parquet:
pipeline.query_parquet(
"SELECT * FROM user_stats WHERE created_at >= '2024-01-01'",
path="stats_2024.parquet"
)Whatโs Next
- Materialized Views: Learn how to declare materialized views
- Sql Data Types: Understand supported data types
- Rest Api Overview: Use the REST API for ad-hoc queries