Incremental Update 6 at Feldera

Incremental Update 6 at Feldera

Gerd Zellweger
Gerd ZellwegerHead of Engineering / Co-Founder
| September 17, 2024

We’re excited to announce the release of v0.26, which represents a significant step forward for Feldera. This release includes over 200 commits, adding 25,000 lines of new code and documentation. Let's dive into the highlights!

Introducing the VARIANT Data Type

One highlight of this release is the introduction of the VARIANT, a powerful type that can store any kind of data, including entire data structures within a single column. This feature simplifies the handling of complex JSON data, making parsing and manipulation as simple as possible.

Here is an example, that demonstrates how you can use VARIANT:

CREATE TABLE employee_data (
    employee_id INT,
    name STRING,
    age INT,
    details VARIANT
);

CREATE VIEW employee_roles_departments AS
SELECT
    employee_id,
    name,
    CAST(details['role'] AS STRING) as role,
    CAST(details['department'] AS STRING) as department
FROM employee_data;

CREATE VIEW employee_with_hobbies AS
SELECT
    employee_id,
    name,
    CAST(details['hobbies'] AS STRING ARRAY) as hobby
FROM employee_data
WHERE details['hobbies'] IS NOT NULL;

The gist of it is that you can store arbitrary JSON data in details, which can be accessed and manipulated using CAST (and various other functions).

Want to try it out yourself? Press the RUN button in the top-right after you hover over the SQL code. Once you compiled the program in our sandbox, start it and toggle the views employee_roles_departments and employee_with_hobbies in the change-stream.

Next, click anywhere inside of the Change Stream panel, then Copy & Paste the following JSON to insert some rows into the employee_data table:

[
    {"relationName": "employee_data", "insert": {"employee_id": 1, "name": "Alice", "age": 30, "details": {"role": "Engineer", "department": "Development", "hobbies": ["reading", "golf"]}} },
    {"relationName": "employee_data", "insert": {"employee_id": 2, "name": "Bob", "age": 25, "details": {"role": "Analyst", "department": "Finance", "previous_roles": ["Intern", "Junior Analyst"]}} },
    {"relationName": "employee_data", "insert": {"employee_id": 3, "name": "Charlie", "age": 40, "details": {"role": "Manager", "department": "HR", "hobbies": ["traveling"]}} }
]

VARIANT is a very flexible type, we currently don't support every possible operation with it. However, we plan to add more functionality over the next few releases. For a detailed description of what's possible right now, you can check out the VARIANT documentation.

DBSP Engine Performance

Our performance expert Ben has been optimizing the DBSP engine by refining how much data a DBSP circuit processes in a single step. His findings revealed that by feeding dbsp smaller input batches, we can drastically reduce memory usage. If you're interested in more details you can read the full discussion. We're happy to report that these optimizations have resulted in a peak memory reduction from 33 GiB to 4 GiB (!) for one of our user's benchmarks. The changes are part of the v0.26 release.

Input/Output Connectors

We've also added support for a new data format, Avro. Avro is a compact binary format that is well-suited for high-throughput data processing. You can now use Avro as an input or output format for your pipelines. We support various "flavors" for interoperability with Debezium connectors and also Confluent JDBC which allows you to use a registry to retrieve the latest schema definitions from your schema registry.

Check the new page on Avro in the documentation for more information.

Ad-Hoc Queries

Finally, we are excited to announce that we have begun working on ad-hoc queries!

This feature is still experimental but is available for testing in our CLI. Simply run fda shell pipeline-name to enter the shell for a pipeline and execute SQL queries to inspect tables and views or insert data.

Ad-hoc queries are a powerful tool for debugging and exploring your data, and we're looking forward to see how you use them. Here is a video that demonstrates ad-hoc queries with the variant program we looked at earlier:

a video that demonstrates ad-hoc queries with the variant program we looked at earlier

We also plan to add ad-hoc query support to the web interface and python SDK shortly. So stay tuned for more updates!

Other articles you may like

Database computations on Z-sets

How can Z-sets be used to implement database computations

Incremental Update 5 at Feldera

A quick overview of what's new in v0.25.

Incremental Update 2

A quick overview of what's new in v0.21.