Blog — Advancing Analytics

Lakebase: AI's New Friend

Written by Luke Menzies | Dec 23, 2025 8:59:59 AM

Inroduction

With Databricks charging full throttle into the world of Data & AI, new tools are arriving at an increasingly rapid pace. One of the latest additions fresh out of private preview is Lakebase — Databricks’ fully managed Postgres OLTP database. Designed for low-latency application support, Lakebase isn’t strictly an AI tool, but it’s a powerful enabler for hosting and supporting models efficiently within real-time applications.

Combined with another new feature, Databricks Apps, this opens the door to building full production-grade ML/AI solutions entirely within the Databricks ecosystem.

As an AI & ML specialist, my initial reaction to Lakebase was somewhat dismissive — the term OLTP doesn’t exactly scream “AI”. But as anyone working in this space knows, AI & ML have long since moved beyond just model development. So where does Lakebase fit into the AI landscape?

The answer lies in delivering AI-powered applications that rely on a Lakehouse architecture within Databricks. Lakebase offers a fast, efficient way to support real-time transactions in applications. For example, imagine a prediction model triggered via an app — Lakebase enables rapid logging of model responses and quick access to Lakehouse data, making it a compelling tool for real-time use cases.

Whether it’s logging chatbot session history for prompt engineering, or capturing user inputs and model outputs directly into a database without the need for ETL pipelines, Lakebase shines in scenarios where speed and simplicity are key.

In this blog, we’ll walk through how to bring all the key components together — Databricks Apps, Lakebase, Databricks Asset Bundles, and ML models — to deliver a production-grade, real-time ML/AI application, all within the Databricks ecosystem.

More on Lakebase

Let’s take a moment to properly introduce Lakebase. It’s a new extension of the Lakehouse architecture from Databricks, designed to unify transactional (OLTP) and analytical (OLAP) workloads on a single platform — all powered by Delta Lake technology.

Lakebase supports real-time transactional operations such as inserts, updates, and deletes on Delta tables, while simultaneously enabling large-scale analytics on the same data. Crucially, this is achieved without duplicating data or relying on separate systems. The result? You can run applications like inventory systems, real-time customer updates, and event-driven apps alongside analytics and machine learning workloads — all with low latency and full ACID compliance.

Although Lakebase is currently in public preview, Databricks features that reach this stage are typically production-ready (with just a few extra bits of functionality to come when it hits general availability).

Here’s a quick summary of what Lakebase brings to the table:

  • Real-time writes and queries with concurrent reads
  • Built-in indexing, caching, and vectorised execution for low-latency performance
  • Table cloning and database branching to support development and testing workflows
  • Unified governance via Unity Catalog integration
  • Separation of storage and compute for elastic scalability
  • Managed change data capture (CDC) to keep OLTP and BI models in sync

In order to turn this feature on, use the follow option in the previews section of Databricks

Databricks Asset Bundles

Databricks Asset Bundles are a handy toolkit for managing your data and AI projects within Databricks. Think of them as a way to wrap up everything you need for a project — your code, tests, configurations, and more — into one neat bundle that can be deployed and managed as a single unit.

Crucially, Asset Bundles now support packaging Databricks Apps, meaning you can bundle the entire infrastructure together. This makes it much easier to build and maintain production-grade solutions, especially when working in teams.

They’re designed to support proper software engineering practices — things like source control, code reviews, and continuous integration/deployment (CI/CD). So if you're used to working in a structured development environment, this will feel familiar and intuitive.

With Asset Bundles, you get everything in one place: notebooks, Python files containing business logic, Databricks jobs, pipelines, model serving setups, supporting application code, and even tests. You can also define how your project should be structured and deployed using simple YAML files, making collaboration smoother, deployments more automated, and version control far easier to manage.

Databricks Apps

Databricks Apps are interactive tools built directly within the Databricks workspace, offering a user-friendly front end for your models. Imagine having an app that brings together data, AI models, and dashboards — all in one place — without the hassle of setting up separate services or managing hosting environments.

These apps are built using popular Python frameworks such as Streamlit, Dash, or Flask, and run on Databricks’ serverless infrastructure. That means you don’t need to worry about configuring servers or dealing with security setups — it’s all handled behind the scenes.

What makes these apps particularly useful is their ability to connect directly to your Delta tables, ML models, and Databricks jobs. This ensures that the data is live, governed, and protected using the same security controls already in place within your Databricks environment. It’s a great way to serve up insights or run AI-powered workflows that business users can interact with — without needing to write code or SQL.

You can build and test these apps locally or in the cloud, and share them easily across your organisation.

Getting Started: Combining Databricks Apps with Lakebase

To begin, the first step is to set up a Lakebase Postgres instance. Head to the Compute tab in the Databricks UI, select the Lakebase Postgres option, and create a new database instance.

Once you’ve selected Lakebase Postgres from the Compute tab in the Databricks UI, you’ll be presented with a range of configuration options — things like instance size, name, and capacity. Simply choose the settings that best suit your use case and wait for the instance to be deployed.

If you prefer working programmatically, you can also use the Databricks SDK to create the instance instead. This is particularly useful for automating deployments or integrating the setup into a larger CI/CD workflow.

Note: You’ll need to be using Databricks SDK version 0.56.0 or higher to follow along with this step.

Once your Lakebase Postgres instance has been created, the next step is to register it as a catalog. This allows you to access OLTP data directly within Databricks — without needing to build out ETL pipelines — making it much easier to incorporate transactional data into your analytical workflows.

By registering Lakebase as a catalog, you’re effectively bridging the gap between real-time operational data and your Lakehouse analytics, enabling faster development and more integrated AI solutions.

Alternatively, you can use the sdk

And there you have it — you're well on your way to harnessing the power of Lakebase. The next step is to integrate it with your Databricks App.

Now that your Lakebase Postgres instance is set up, you can begin using it within your Databricks app. There are a few ways to go about creating apps, but the simplest method is through the Databricks UI.

Just head over to the Compute tab and click Create App. From there, you’ll be guided through a selection of templates and configuration options to get your app up and running.

This will take you to the selection screen, where you’ll find a range of options — from blank apps to predefined templates — depending on how you want to get started.

Once you click Create App, you’ll be taken to the template selection screen. Here, you’ll find a variety of options — from blank apps that let you build from scratch, to predefined templates that offer a head start depending on your use case.

You’ll also be able to choose the language or framework for your app’s front end. If you’re comfortable with Python, Streamlit is a solid choice — it’s flexible, easy to use, and lets you build professional-looking interfaces using familiar Python syntax.

Here’s an example from the team at Advancing Analytics, showing how an app can incorporate AI models in the backend while presenting a clean, interactive UI.

This design example incorporates AI models in the backend. For the remainder of this blog, we’ll focus on a simpler UI that displays the contents of a table. You can find the template repo here:

🔗 https://github.com/sylvia-222/lakebase-dbx-app-template

We’ll be using this template with the idea that it could later be extended to include an AI model. With that in mind, let’s move on to integrating the app with the Lakebase instance we created earlier.
Using the template above, the next step is to configure the app so it connects to your Lakebase resource. This can be done either during the initial setup or afterwards by editing the app’s configuration. Here’s a visual example of what that looks like:

Adding the Database option allows your app to connect to the Lakebase instance you’ve created. Once that’s in place, the next step is to make sure all the necessary permissions are correctly configured.

Any Lakebase instance must allow users — or service principals — to access the database with the appropriate level of control. If your app is running under a service principal, it will need explicit permissions to read from and write to the database.

Here’s a quick example of how to grant those permissions using SQL, which ensures your app can interact with the schema and table as expected.

If you're using Databricks Asset Bundles, you'll need to include the database_instance and database_catalog entries in your databricks.yml file. This ensures your app has the correct access to the Lakebase resource when deployed.

These entries define the Lakebase instance, the catalog it belongs to, and the permissions required for the app to interact with the database. It’s a key step in making sure everything is wired up properly for development and production use.

Once Lakebase is connected, you're ready to start using it within your app. In this basic example, we’ll use the app to display holiday requests stored in a Delta table, and allow users to update those requests directly through the app — no ETL pipelines required.
Before we start modifying the app, we need to create the table within the Lakebase-linked catalog and insert a few dummy entries to work with. This is done using a simple SQL query that targets the Lakebase catalog.

You’ll need to copy the Client ID into the <CLIENT_ID> section of the SQL code shown earlier to grant your app access to the relevant schema. This ensures the app can read from and write to the Lakebase table as intended.

You can find the Client ID within the Databricks App UI — look for the field labelled Databricks Client ID. Once added to your SQL grants, your app will be authorised to interact with the database.

Now that the app is connected to Lakebase, the next step is to start building out functionality within the Streamlit app to allow users to update holiday request entries.

Inside your app.py file, you’ll need to include the code that connects to the Lakebase database. This sets up the connection pool and ensures the app can securely interact with the database. Here’s how you can do that:

Next, we’ll define a function within the app that updates the Delta table using the postgres_pool connection. This function will allow the app to write changes — such as updating the status or manager’s note — directly to the holiday requests table.

Here’s how you can set that up:

Using this function, the Delta table is updated in real time via the web application. It allows users to modify records directly from the app interface — for example, updating the status of a holiday request or adding a manager’s note.
You can trigger this function using the following snippet, which ties it into the Streamlit UI:

Putting all the pieces together and running the app allows users to update entries directly from the interface. For example, in the screenshot below, row ID 2 has its text column updated with the phrase “still waiting”. This change is written back to the Delta table in the background — no ETL pipelines required.

You’ll see the update reflected immediately in the table displayed within the app’s UI.

You can also query the Delta table directly in Databricks to confirm that the data has been updated as expected.

So What Does This Mean for AI?

Updating a holiday request table via a web app might sound like a simple task, and on the surface, it is. But for AI practitioners, it points to something much bigger.

With Lakebase, we can enable real-time interactions between applications and models, without the faff of traditional ETL pipelines or external databases. The result? Faster feedback loops, a more streamlined architecture, and AI-powered experiences that feel genuinely responsive. It doesn’t stop there. When you combine Lakebase with Databricks Apps and Asset Bundles, you get a fully packageable solution that makes building enterprise-grade AI applications quicker and easier. 

Now, while Lakebase is still fresh off the shelf, there are already updates in the pipeline. It’s currently available at a discounted rate, so if you’ve got a Databricks workspace and a mature Lakehouse in place, now’s a great time to roll up your sleeves and give it a go. It’s not for everyone. You’ll need a well established Unity Catalog enabled Lakehouse within Databricks to really benefit from it. But if you’ve got the setup, Lakebase is a powerful tool worth exploring.

 

Costs and Considerations When Using Lakebase

Before diving headfirst into Lakebase, it’s worth taking a moment to understand the pricing model and a few practical considerations that come with it.

Pricing Overview

Lakebase is a fully managed, serverless Postgres database integrated with the Databricks Lakehouse Platform. Its pricing model is designed to be flexible and usage-based, allowing you to scale compute and storage independently and only pay for what you use. Please be aware these are the current prices to date and will be subject to change. It is also worth mentioning they're currently offering a 50% discount (until May 2026)! 

Storage
Lakebase uses the Databricks Storage Unit (DSU) model for billing. Storage pricing varies by cloud provider:

AWS:

  • $0.023 per DSU/month
  • Equivalent to $0.345 per GB/month (15 DSU per GB)

Azure:

  • $0.026 per DSU/month
  • Equivalent to $0.39 per GB/month (15 DSU per GB)

Compute
Lakebase offers two compute models:


Serverless Database Compute

  • AWS: $0.40 per DBU
  • Azure: $0.52 per DBU
  • Billed under the Database Serverless Compute SKU
    Includes cloud instance costs
Provisioned Capacity

Measured in Capacity Units (CU):

  • 1 CU = 1 DBU/hour
  • 1 CU ≈ 16 GB of memory
  • Scalable in increments of 2 CUs
  • Minimum billing duration: 10 minutes
  • Supports High Availability (HA) configurations

Data Synchronisation
Lakebase supports automated synchronisation between Delta tables and Postgres tables using DLT pipelines:

  • Billed under Automated Serverless Compute SKU
  • Priced at $0.35 per DBU
  • Supports:
    • Delta → Postgres (read-only)
    • Postgres → Delta (streaming)

Performance and Scaling

Lakebase is designed for high concurrency and low latency. You can expect:

  • <10ms latency for point reads
  • Up to 10K read queries per second
  • Write throughput of ~15K rows/sec (initial) and ~1.2K rows/sec (incremental)

Each instance can scale from 1 CU (16GB) up to 8 CU (128GB), with larger sizes planned. You can run up to 10 instances per workspace, and each database supports up to 1,000 concurrent connections.

Monitoring and Observability

Lakebase includes built-in monitoring dashboards showing:

  • Transactions per second
  • CPU and storage usage
  • Cache hit rates
  • Deadlocks
  • Open connections

You can also use tools like pgAdmin, DBeaver, or PSQL to monitor and manage your Lakebase instance.

Authentication and Governance

Lakebase supports both PostgreSQL native authentication and Databricks workspace OAuth. For governance, you can use Unity Catalog to manage access and lineage, or stick with native Postgres permissions depending on your setup.

Things to Keep in Mind
  • Data sync between Delta and Postgres is powerful but comes with limitations — especially around write-back scenarios.
  • Identity management between Databricks and Postgres is not automatically synced, so you’ll need to manage roles and permissions carefully.
  • Storage is decoupled from compute, which is great for flexibility but requires a bit of planning around capacity and cost.

Conclusion

Lakebase might not be the flashiest tool on the surface, but it’s a game-changer when it comes to building proper production-grade ML and AI applications. By bringing together real-time transactional capabilities with the analytical power of the Lakehouse, Lakebase fills a crucial gap letting us log, query, and interact with data in real time, without faffing about with ETL pipelines or external databases.

When paired with Databricks Apps and Asset Bundles, you’ve got a full-stack solution right inside the Databricks ecosystem. No more juggling between platforms, worrying about security handoffs, or patching together brittle integrations. You can build, deploy, and iterate on AI-powered apps with speed and confidence.

Whether you're logging chatbot sessions, updating predictions, or just building slick internal tools, Lakebase gives you the flexibility and performance to do it properly. And with Databricks continuing to push the envelope, it’s a safe bet that this is just the beginning.