Announcing S3-backed Pipelines

Announcing S3-backed Pipelines

Ben Pfaff
Ben PfaffChief Engineer / Co-Founder
| February 4, 2025

Incremental computing is fundamentally about trading off time for state. An incremental pipeline maintains state that records previous computations, so that when new changes arrive, it can incrementally update all views. A pipeline’s state can sometimes even be larger than the original dataset! Feldera has long had the powerful ability to keep state on storage, allowing users to leverage fast NVMe local disks and even remote storage like EBS.

Today, I'm pleased to announce Feldera's upcoming support for storage on S3, Google GCS, Azure Blob Store, and other object stores. This means Feldera can compute on datasets that are not only larger than RAM, but also larger than and independent from local storage, enabling the endless storage capacity of cloud object stores. This allows users to compute integrals extending into 100+ TB range, which many of our enterprise users have.

It's not just scalability to large data set sizes that motivate this work, but also operational convenience. With this feature, users no longer need to size pipelines in advance. The pipeline and cluster can also survive failures and be rescheduled across availability zones, picking up from where they left off.

With this change, Feldera users will have unparalleled flexibility across the whole spectrum of use cases; from real-time, low-latency scenarios all the way to large-scale batch computing use cases. All of this comes with the convenience of SQL and the power of our state-of-the-art incremental compute engine.

S3-backed pipelines will be available to our Enterprise users in February as a preview feature. Until then, here's a sneak peak.

Users can configure the S3 storage backend like so:

Once the pipeline is running, it creates S3 objects that correspond to its internal state. No local disks or EBS volumes required!

If you’re interested in giving this a spin do, reach out.

Other articles you may like

Database computations on Z-sets

How can Z-sets be used to implement database computations

Implementing Batch Processes with Feldera

Feldera turns time-consuming database batch jobs into fast incremental updates.