Skip to main content

Storage has shipped!

30 July, 2024

Ben Pfaff

Co-Founder

With our recent release, we shipped the "storage" feature for Feldera, which enables the query engine to easily handle datasets larger than memory.

Let's look at an example to see what that means for for Feldera users. Consider the Nexmark benchmark, which is commonly used to measure the performance of streaming systems. It simulates an online auction system with tables that represent auctions, bidders, and bids. Let's run Nexmark query q19, which selects the top 10 bids on each auction. It uses only the bid table and in Feldera it can be defined in SQL this way, given the bid table definition:

CREATE VIEW q19 AS
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY auction ORDER BY price DESC) AS rank_number FROM bid)
WHERE rank_number <= 10;

Suppose we run this query in Feldera against 100,000,000 events of input data. On my test machine, a 64-core Threadripper 3990X with 256 GB RAM, running with 16 Feldera worker threads, it runs in about 61 s and uses about 51 GB RAM at peak. If I double the input to 200,000,000 events, it takes about 144 seconds and peaks at 111 GB of RAM. Whether 51 GB or 111 GB, that's a lot of memory to allocate:

input eventsruntimepeak memory
100,000,00061 s51 GB
200,000,000144 s111 GB

If we rerun the above test with storage enabled, memory usage drops greatly:

input eventsruntimepeak memory
100,000,00057 s23 GB
200,000,000166 s30 GB

What this means for you is that a single node running Feldera can take you very, very far—beyond a million events per second at low cost! Stay tuned for more information and blog posts with more detailed information.

Feldera UI pipeline view during q19 run with storage