Incremental Update 20 at Feldera

We've just shipped feldera v0.41 (and v0.40 last week). So this update will cover both versions. In the future, we plan to release more often and decouple the bi-weekly incremental update from released versions.

Performance Improvements

Several performance improvements made it into main this week; we list the most important ones here:

We parallelized connector initialization. This should speed-up pipelines with many (10-100s) of connectors.
We implemented a missing optimization for recursive programs (that we already had for non-recursive programs). This means that recursive programs should run faster now.
We added a common-subexpression elimination pass to the compiler, this can speed-up complex SQL code significantly.

Ad-hoc Queries and Connectors

We now enforce resource constraints in our ad-hoc query engine. This means when the pipeline has limits configured, certain ad-hoc queries may spill-to-disk (in case the memory limit is reached) and queries will generally respect the requested CPU limits.
We've added a new filter attribute to the delta-lake connector. It can filter rows and prevent them from being ingested from a delta lake.

Documentation

We added part 3 to our guide on Accelerating Batch Analytics with Feldera. It discusses how to backfill pipelines with historical data sources and incorporating real-time data streams.
We've also updated the design for docs.feldera.com and cleaned up the documentation in general.

Incremental Update 20

Performance Improvements

Ad-hoc Queries and Connectors

Documentation

Other articles you may like

Database computations on Z-sets

Implementing Batch Processes with Feldera

Feldera: three tools for the price of one