Performance optimization with native execution engine for Fabric Spark

Context

Microsoft announced in May 2024 the native execution engine for Fabric Spark to improve the performance of workloads in Spark. During Power BI / Fabric Summit 2025 I watched a great session on the topic with Ankita Victor and Estera Kot and decided to try it myself. It is important noting that this feature is still in preview (as of March 2025).

The goal of this post is:

  • Introduction to the native execution engine

  • Compare the performance between Fabric Spark and Fabric Spark with native execution engine

Native execution engine

This post is not a deep dive into the native execution engine, but you can find in the end some great references if you want to read more about it.

The first question is: why should I care about this feature in the first place?

Well according to Microsoft this feature should bring better performance for your workloads in Fabric at no additional cost. So even if you don’t work with large data volumes, it might be worth trying out. But for those who care about what happens under the hood, why do I get better performance?

According to “Learning Spark, second edition”, traditional Spark executes operations in a row-based fashion. Even though Spark 2.x introduced the second-generation Tungsten engine that uses optimizations like whole-stage code generation, there’s still JVM overhead. There are two essential components for native execution engine in Fabric:

  • Velox: Open-source C++ library created by Meta. High-performance vectorized execution engine designed for columnar data processing

  • Apache Gluten: Open-source project created by Intel. This is the bridge between Spark and Velox. It works like a middle layer responsible for offloading execution to Velox

Here are some advantages of using the native execution engine:

  • Native C++ execution: Spark can now be executed natively in C++ instead of running entirely on the JVM

  • Column-based processing: By processing data in a columnar format rather than row-based, the engine can leverage vectorized execution and reduce per-row overhead

Here are some references for more details about Velox, Apache Gluten and the native execution engine:

Performance test

This test was performed with the NYC taxi - yellow data. You can load this data directly in your notebook with Pyspark (https://learn.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=pyspark). This dataset has around 1.57 billion rows.

How to enable the native execution engine?

  • In this example, I created an environment and enabled the native execution engine under acceleration. Then you can assign the notebook to use this environment

Performance test

In this test, a simple sum of a column was applied, and then I used the collect function to trigger the execution of the query and retrieve the result:

See below the Spark physical plan and the performance difference when native execution engine is enabled:

You can find some important points comparing both physical plans below:

  • You can see that the word “Transformer” was displayed quite often in the physical plan (see right photo above), this means that the native execution engine executed the operation

  • According to the Gluten documentation, the symbol “^” indicates a plan is offloaded to Velox in a stage

  • According to the Gluten documentation, the stage “VeloxColumnarToRowExec” means that there is a fallback

  • Without native execution engine there is an early row conversion after scanning the parquet (see the operator ColumnarToRow), whereas using the native execution engine data remains in columnar format through most of the plan, converting to rows only at the final stage (see the operator VeloxColumnarToRow)

The duration of both queries can be seen in the Spark UI as follows:

  • Query duration without native engine execution: 4.9 min

  • Query duration with native engine execution: 2.0 min

This shows a significant performance improvement, without changing anything in the code. I would say, a big win for a small change (in this case the only change is to enable it).

It is also worth noting that not every operation works with the native execution engine (for this check the Microsoft documentation), but there is a fallback mechanism to prevent errors and ensure execution.

You can find the Python notebooks in my Github repo:
Github Repo

Last updated on March 03, 2025

Next
Next

Part 2: Microsoft Fabric admin - Adding Entra ID groups to workspaces with Semantic Link and Python