Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

10 Amazing Features of Microsoft Fabric!

Fabric is Microsoft’s brand-new SaaS analytics platform just announced at their Build conference. We gave an introduction to what it is in this post. We think there are some great features in Fabric so in this blog post we want to highlight our top 10.

1. ONELAKE

OneLake is at the heart of Fabric and with it comes the ability to store data in Delta Lake format. This is a really positive step forward for Microsoft and this opens up a lot of opportunities for anyone using Fabric.

Delta is an open file format which gives us a great amount of flexibility and performance opportunities within the lake.

OneLake’s Global storage

It also encourages you to store data only once as Delta is compatible with all workloads, meaning you reduce the overhead of ingesting and copying data multiple times. Microsoft are fostering a Data Mesh and Data Lakehouse hybrid architecture here, meaning we can democratise ownership of data while discouraging data silos across the business as everyone comes together in Fabric and OneLake underpins all data.

2. DIRECT LAKE

Direct Lake is a feature found in Power BI and it does exactly what it says it will, allowing you to directly queries the Data Lake from Power BI by surfacing your underlying Delta tables in a way that Power BI can use them like a dataset. But why is this a good thing?

Previously, when designing analytics platforms, there was a choice to be made between “Direct Query Mode” or “Import Mode” in Power BI. Each has trade-offs, choosing Direct Query would give you the most up to date data but would come with poorer performance. Or choose import mode and you can have a more performant report experience but with latent data and a potentially long report refresh time.

Instead of having your Data Warehouse or Lakehouse scan the OneLake and then have Power BI query or import from there, Power BI can go directly to the Lake which is faster and this is all thanks to the performance of Delta.

3. GIT INTEGRATION FOR POWER BI DESKTOP

Something that I have been wanting to see for a while is proper Git integration for Power BI. With Fabric it is now possible to work on your local machine and push your Power BI changes into a remote repository to be able to work with others. The .pbip file format has been introduced and this stands for ‘Power BI Project’, this is the format you need to use to be able to work with Git.

You can see all the changes you make, things such as themes and changes to report metadata but something important to point out is that it will not store your report data. You can work with a tool of your choice locally to commit and push your changes; we have tested this using VSCode. This feature brings Power BI inline with standard development practices which will be a huge benefit.

 

4. DOMAINS

When you look at Fabric as a whole, Microsoft are fostering a Data Mesh architecture and this is supported with the ability to create and govern Domains.

An example of Domains around workspaces

Domains give you the ability to organise your workspaces into logical groups. You might have a Finance department which has specific data requirements and a HR department which could need access to more sensitive data. Both departments can be fully isolated and can have their own ‘Domain Admins’ which have full control over their Domain. A Domain Admin can also control the associated costs of a particular domain which is great for implementing that Data-Mesh style control over your Fabric tenant.

5. CAPACITIES: SMOOTHING & BURSTING

Capacities determine how much compute power you have to operate your data platform within your Fabric tenant. They have some very interesting features that can help to improve the performance of your Fabric tenant. Analytics platforms traditionally tend to see large usage spikes at specific times of the day or during certain events, examples of this are things such as loading new data or querying datasets. The problem is that during those specific spikes we need more compute power and more capacity to handle them but during the rest of the day, we may need far less. You then run into the issue of having a mismatched amount of capacity for when you need it vs when you don’t, and this is where smoothing and bursting come in.

Before Smoothing vs After Smoothing

Smoothing comes into play by borrowing some of the capacity available during quieter times, instead of seeing large spikes we now see smaller spikes with the load smoothed across the day.

Bursting allows for jobs to utilise additional compute power to assist their completion times, this works in tandem with smoothing to prevent performance spikes.

6. LAKEHOUSE

As you may expect, the ability to create a Lakehouse in Fabric is something we are very excited about. A Data Lakehouse takes the best parts of a data warehouse, the governance, and structure, and merges that with the flexibility and scale of a data lake. All enabled by the Delta file format we mentioned previously.

The concept exists in Fabric as a ‘Lakehouse Artifact’, and you can create one when you select the ‘Data Engineering’ persona. Lakehouses in Fabric sit a little differently though, and essentially represent a single layer each, within the Lakehouse architecture. This is a great thing for everyone, in tandem with Spark notebooks and Delta format within the Lake it will enable people to adopt the Medallion architecture as well as taking advantage of the huge amount of capabilities and performance that come with it.

7. SPARK

Data Engineers can use Spark to transform data within Fabric and we are incredibly happy to see this as a feature. Spark is an incredible tool to build out robust, meta-data driven processes. Spark can be accessed via Notebooks as part of the Data Engineering persona and has a feature rich experience.  If you want to work locally you can do that as well, Microsoft have implemented a VS Code extension which lets you work on a remote Spark cluster from your local machine which is pretty useful! As of public preview Runtime 1.1 will come with Spark 3.3.1, Delta 2.2 and Python 3.10.

8. VERTIPAQ & VERTIPARQUET

Vertipaq is the compression engine which you normally see within Power BI but Microsoft have deeply integrated it into Fabric which is great news. Microsoft have implemented a new type of compression against the underlying Parquet files to improve performance and compatibility with the Vertipaq engine, predictably naming it ‘Vertiparquet’. Vertipaq and parquet are a good duo as they are both column focussed. This gives you a smaller file size when writing to Delta from Fabric and that in turn means faster reading of those files. This is what enables our 2nd pick above, Direct Lake to give you blazing fast performance in Power BI.

9. COPILOT

As with most things these days, Microsoft have integrated Co Pilot deeply into Fabric. Within our Power BI experience, and within notebooks, we can leverage Co-Pilot to help us shape and style reports, write code snippets, and help us query our data. This is truly the most interesting part of Fabric and Microsoft’s vision.

10. KUSTO

Kusto is a familiar language to some but maybe not for others, it is found within Azure Data Explorer and you might already use it to query Log Analytics. Within Fabric, it found a home within the Real-time Analytics workload and it will help to query vast amounts of data (Petabyte levels!). Kusto is incredibly versatile and although this may be a new skill for some people to pick up, is well worth the time investment due to its abilities. It is far more performant than SQL, which has a larger audience, but when we start ingesting data that is constantly growing by millions of rows each day or even minute, then we need something that can handle that volume. You can find some great resources about Kusto from Microsoft here.