loader

Is Reyden the future of Analytics?

The feature within a feature

There were many exciting announcements at Databricks Data+AI Summit 2026. From Omnigent to Genie Ontology to ZeroOps, the keynotes were filled to the brim with exciting new products and features. So much so, that the one that I think will be most impactful on our industry has barely received the attention it deserves!

Reyden, or Reynold’s Dream Engine, was announced as a new compute engine that will power Lakehouse//RT. In the keynote, it was easy to miss that Reyden is not just the Lakehouse//RT engine though, and that Lakehouse//RT is merely the first implementation of Reyden.

Having now had time to reflect on the week of announcements, I feel Reyden might be the one we eventually look back on as the game changer that really enabled Databricks to compete in the Analytics space.

But what actually is Reyden?

Reyden is an entirely new engine, built from the ground up for speed. 

Spark was written in Scala, which ran on the Java Virtual Machine (JVM) platform. This meant it had memory overhead bloat and some painful interoperations that slowed the engine down. That's why a few years ago, Photon was launched. Photon, which executes in native C++, massively accelerated Spark by eliminating garbage collection pauses, JIT warm-up delays, and memory overhead.

There is no direct Databricks source for the detail of how Reyden was built but in one of the deep-dive talks at summit, it was mentioned that the new Reyden engine "has no JVM overhead and integrates with Photon, with little difference between them".

To me, this implies that Reyden is a rebuild of Spark in a native, non-JVM systems language (such as C++ or Rust). Either way, this would make Reyden closer to the hardware and mean it has less overheads, so it would have less memory overhead. In other words, it'll be much faster than anything we've seen before!

Reyden also employs cutting edge self-training AI loops to automatically optimise the execution plans to tune query performance. The analogy used in a deep-dive session by the creators of the engine was that different engines perform better for different query types, so they stopped building one engine and made Reyden a factory of engines with a continuously trained ML model that near instantly determines which to use at runtime.

However, Reyden isn’t quite fully there yet. It’s doesn’t support the full range of ANSI SQL yet, though one has to assume that this will be coming shortly! The developers also shared that they are working towards adding capability for full-text search, Geospatial SQL and Spark Connect to Reyden.

Keynote demo – Reyden in action

The keynote included a demo of Reyden’s capabilities which completely blew my mind.

We started with a SQL script over the New York City taxi dataset. Running that query via a standard SQL warehouse took 1.169s. The same query run through a Reyden engine… 7ms or 0.007s. That’s a 167x performance improvement.

Reymund then proceeded to show a simulation of running the queries required to run 1,000 dashboards simultaneously. This was roughly 8,000 queries. It ran at a rate of 5,972 queries per second, with a P95 tail latency of 37ms (which means that 95% of the queries run within 37ms each).

  They went on to show how Latency compared to Throughput for Reyden and how that measures up against some of it’s competitors. It was a striking visual that showed Reyden runs at a speed well beyond anything we’ve ever seen before.

And even more mind-blowingly, in a deep dive session, the engineers behind the project claimed that Reyden is now up to over 16,000 QPS with a tail latency under a second. Compared to the 12,000 QPS shown in the keynote session, this is already >30% better than the stats shown in the keynote and shows the rapid pace of development that Databricks are applying to this area!

Making the leap to Power BI

The first thing I noticed about the design of that keynote demo was that it was clearly replicating the query pattern that Power BI usually sends to its compute layer. AI/BI dashboards don’t necessarily send multiple queries per dashboard unless you specifically design them to. Power BI however, is notorious for it.

Power BI has 3 options for getting data into the model:

  1. Import mode, where data is loaded during a refresh from the source into Power BI’s VertiPaq engine;
  2. Direct Query mode, where queries are sent directly to the source as and when people interact with a model/report; or
  3. Direct Lake, which is a combo of the two that only works when the data is stored in the Fabric OneLake.

Generally, up to now, the VertiPaq engine of Power BI has outperformed any compute engine you could link Direct Query to & so most people will use either Import mode or Direct Lake. Other than in a few exceptional use cases, I have only ever found Direct Query to be consistently viable when you are operating beyond the scale that can realistically be managed in an import model, and only then because there is no viable alternative if your data isn’t stored in OneLake.

This is something Databricks hinted at when they included the VertiPaq engine in this slide on the design of your serving layer. It clearly shows the current "accepted wisdom" that to get fast performance, you need a copy of your data in a serving layer (i.e. Import mode Power BI using the VertiPaq engine):

The main reason this became the design is because directly querying your data in a warehouse is slow. When you consider a standard SQL warehouse engine, each query has on overhead that stops it from achieving the speed requirements.

Reyden, though, doesn’t have that problem, as was evident from the keynote demo. With Reyden backing, Power BI might have finally met a compute engine that can deliver on the promise of Direct Query being performant at any scale. With Reyden promising such lightning quick responses, we may now be reaching a place where avoiding the VertiPaq engine and using Direct Query, even for small datasets, is actually the more performant option. This is still yet to be proven but it is nevertheless exciting.

Why does this matter?

So, Reyden might be more performant than the VertiPaq engine. So what?

Databricks have been making a clear play over the last few years towards owning the semantic layer. A semantic layer is the layer over the top of your Gold / Curated data, where business metrics, aggregations and relationships are usually defined (for more info see this blog from Databricks). Traditionally, semantic layers sit within your BI tooling, for example in a Power BI Semantic Model.

Recently though, Databricks launched Metric Views & they have since expanded this into the wider Unity Catalog Business Semantics. These represent Databrick’s semantic layer implementation, with the idea that you create your metric definitions, aggregations, relationships, etc. here and then consume them downstream in analytical products.

Also at Summit, Databricks announced Genie Ontology. Genie Ontology will consume your business semantics to enable Genie to work at its best. As such, Databricks needs to have as much of your semantics in Unity Catalog. They need to own the semantic layer!

The easiest way to own the semantic layer? Own the reporting & analytical use cases!

The biggest argument for using the VertiPaq engine has been that using DAX to write your measures, rather than a SQL based definition, has been that DAX on VertiPaq is more performant. But if executing SQL via Reyden becomes faster than DAX on VertiPaq, maybe it starts to make more sense to define your metrics in SQL? And if you are doing that, then Metric Views are likely a better home for your semantic layer definitions than semantic models. And with Reyden enabling lightning-fast reporting, the argument for switching to AI/BI also becomes stronger.

So, what is Reyden then? In my opinion, it is the engine that may enable Databricks to truly compete with Power BI in the analytics space, even with a less mature visualisation tool.

Is Reyden the future of Analytics?

Hopefully I’ve managed to share my excitement at where this new engine could take us.

Obviously, all of this projection around Reyden will depend on how expensive it ends up being. If, like I suspect, the cost point of running Reyden compute is low enough to still justify using it for analytical use-cases that are sub-real-time, then this could be a real game changer for analytics.

In that scenario, maybe Reyden won’t just be Reynold’s Dream Engine, but will instead become Everyone’s Dream Engine, and lead Analytics to the Garden of Eden?

Ashley Warren

Author

Ashley Warren

Ashley is a Senior Analytics Consultant with Advancing Analytics who focuses on Analytics Engineering and Data Visualisation. Ash is a regular speaker at data events and is an organiser of the Bristol Power BI User Group, as well as a host of the Bristol Databricks User Group