Want to watch this blog instead? Find the YouTube video here: https://youtu.be/GFa6Cf6GEA0
If you work in data engineering today, you’ve probably seen the rise of vibe-coding - that moment where someone prompts an LLM, gets a pipeline out the other side, deploys it, and calls it a day. And in fairness, for the person who gets their clean table of data at the end, job done.
But for the engineering teams who have to support hundreds of those pipelines afterwards?
That’s where things begin to creak.
It’s not that LLM‑driven development is bad. Far from it. It’s that doing it without structure produces… well, let’s call it what it is: AI slop. A thousand pipelines all written slightly differently, none of them predictable, and all of them reliant on someone’s late‑night vibe-coding session. And as fun as that sounds, it’s not something you can sustainably build a platform on.
So at Advancing Analytics, we’ve been asking ourselves a very simple question:
How do we embrace the speed and power of LLM‑generated development, without creating chaos?
And over the past few months, behind the scenes, we’ve been building an answer.
There’s a lot of noise in the industry about where we’re heading: ontologies, agentic systems, decentralised data meshes that dynamically map your entire data estate on demand.
And yes, that is where we're heading. But we’re not there yet.
Right now, we’re in the messy middle; we still need repeatability, guardrails, engineering. Because the idea that every pipeline can be different, and an agent will happily maintain them all? Not today - and not tomorrow either.
So the challenge becomes:
How do we let people move fast using AI, without sacrificing structure, quality, or supportability?
Over the past month, we’ve been quietly prototyping something completely new. Something fast to deploy, surprisingly powerful, and designed to solve exactly the problem we’re all feeling.
We call it:
LakeForge is our new engineering framework for Databricks LakeFlow pipelines - a lightweight, opinionated way to get standardised, predictable, repeatable pipelines that can scale.
Think of it as the opposite of wild-west vibe coding.
This is vibe coding on rails.
At its core, LakeForge provides:
In short, you point LakeForge at your data.
It analyses it, proposes transformations, validates them, iterates until they meet quality thresholds, and finally generates a full set of LakeFlow declarative pipeline files.
All in a couple of minutes - turning days of engineering effort into minutes of automated, high‑quality output.
And this isn’t theoretical. We’re already testing it with client datasets, running it against real tables, validating how it scales, and hardening the prompts and quality gates to ensure it performs in real enterprise environments.
LakeForge isn’t just a pretty wrapper around LLM calls. Underneath, it’s built on a deliberate separation of determinism and creativity, which is what keeps the outputs consistent while still benefiting from AI‑assisted design.
Here’s a snapshot of the core engineering principles:
LakeForge produces LakeFlow declarative specs, not handwritten notebooks.
These specs contain:
Because everything is declared rather than coded, pipelines behave the same way regardless of who - or what - generated them.
All heavy lifting happens in a curated function set (our forge library). LLMs never write transformation logic themselves; they only provide specifications.
The execution logic lives in:
This is what keeps pipeline behaviour identical across hundreds of tables.
Our Pantheon agents operate in a controlled sequence:
Each stage has both LLM‑based reasoning and deterministic checks. If the LLM hallucinates a column or suggests an invalid rule, validation catches it, and only that component is regenerated - not the full pipeline.
Because the spec format is consistent and the generation loop is automated, we can:
It means LakeForge is engineered for real enterprise workloads where scale and supportability matter. This is the part that turns vibe-coding from chaos into a legitimate engineering workflow.
Because real engineering isn’t about avoiding AI. It’s about harnessing it.
Data teams aren’t going to be replaced by agents. But data teams who know how to build systems that use agents well? They’re the ones who will shape the next decade of our industry.
Vibe-coding isn’t the enemy. The lack of structure around vibe-coding is.
LakeForge gives people the freedom to work fast, while still producing pipelines that are:
It brings order to what could otherwise become a very chaotic future.
This is just the beginning. We’re now working on hardening it and refining prompts. Currently we're working with clients on first iterations, and getting a huge amount of real-world feedback to improve it further.
LakeForge is becoming the foundation for how we think pipelines will be built over the next couple of years.
We’re incredibly excited by what’s possible, and even more excited to finally talk about what we’ve been working on behind the scenes.
Watch this space - LakeForge will be launched later this month. We can't wait to show you what it can do.