Analyzing Apple GPU performance with performance heat maps

Gain insights to SIMD group performance by inspecting source code execution.

Overview

Metal organizes the threads of your shader into single-instruction, multiple-data (SIMD) groups. The performance heat maps feature provides a way to quickly find SIMD groups that are expensive or divergent. You can graphically inspect how Apple GPUs execute the shader source code in those groups, and gain insights of potential performance bottlenecks.

View the performance heat maps

To open the performance heat maps, click the Performance button in the Metal debugger’s Debug navigator, and then click the Heat Map tab above the Performance timeline.

[Image]

When you select an encoder, pipeline state, or GPU command in the Timeline navigator, the heat maps for the corresponding work appear on the right.

In addition, performance heat maps are available for the following shader types:

Shader type	Heatmap pixel location
Fragment shader	Fragment position
Object shader	Thread location
Mesh shader	Thread location
Compute shader	Thread location

Switch between shader types

By default, selecting a render command encoder, a render pipeline state, or a draw command shows the fragment shader heat maps. Selecting a compute dispatch command shows the compute shader heat maps.

You can switch between different shader types using the Vertex and Fragment tabs above the heat maps.

Display more types of performance heat maps

By default, the Shader Execution Cost heat map and the Attachments display for render command encoders, pipeline states, and draws.

Click the Add button (+) in the heat map control bar to open a popover of all available performance heat maps. You can customize which heat maps to display by selecting the checkboxes.

[Image]

Available performance heat map options include the following:

Type	Description
Shader Execution Cost	Visualizes the shader execution cost by combining shader execution time and latency hiding.
ALU Cost	Visualizes the ALU portion of the shader execution cost.
Non-ALU Cost	Visualizes the non-ALU portion of the shader execution cost.
Thread Divergence	Visualizes the amount of GPU thread divergence in the SIMD groups.
Control Flow Divergence	Visualizes the GPU thread divergence that controls flow differences, like `IF` conditions, cause.
Inactive Threads	Visualizes the GPU thread divergence that inactive threads cause due to the shape of the geometry or the compute dispatch grid dimensions.
Overdraw	Visualizes the number of SIMD groups that write the pixel.
Instruction Count	Visualizes the number of instructions per pixel or per SIMD group.
Raytracing Latency	Visualizes the latency of ray-tracing instructions, which go through the ray-tracing unit.
Draw ID	Color codes different draws within a render pass.

The color intensity in the heat maps represents the significance of the values. For example, red means more expensive in the Shader Execution Cost heat map, and more divergent in the Thread Divergence heat map.

View and adjust the value range of performance heat maps

You can filter and tone-map a performance heat map by clicking the Histogram button in the title bar.

[Image]

The histogram popover shows the value range of the heat map. You can drag the handles to adjust the range to filter smaller and larger values. This can be useful in situations when you want a certain value range, such as showing pixels in a render pass that executes more than 100 instructions.

Inspect the execution history for a SIMD group

Selecting a pixel in the heat map allows you to inspect the underlying SIMD group.

If more than one SIMD group touches the pixel in a render pass, a list of SIMD groups in order of cost percentile appears so you can select the one to inspect. When you select a SIMD group in the list, its execution history appears below the heat maps.

[Image]

The Execution History timeline shows the progress of the selected SIMD group from left to right, and lists the full shader call stack at each point of execution from top to bottom.

The Metal debugger also helps you better understand shader execution by detecting and visualizing loops in the shader instruction stream. You can select a node in the timeline, and the source editor jumps to the file and the line that contain the executed instructions of that node.

Understand the number of instructions per line

The number of executed instructions appears next to the lines of code, in the gutter of the shader source code. This number is the total assembly code executions over the lifetime of the entire SIMD group for that line of code.

For example, if there’s a loop with 10 iterations, the number of instructions for the source lines within the loop is 10 times more than the source lines outside the loop, given the same amount of assembly code.

Switch between per-line statistics modes

In the shader source code control bar, you can choose different modes for the per-line shader profiling statistics in the gutter. Options include the following:

Mode	Description
Number of Instructions	The total number of assembly instructions that active threads execute.
Thread Divergence	1 - the total number of instructions from active threads divided by the number of theoretical maximum instructions that the system can execute.

You may find thread divergence to be 50% if there’s a conditional branch that only half of the threads enter within the selected SIMD group.

For more information about the Metal profiling tools for M3 and A17 Pro, see Discover new Metal profiling tools for M3 and A17 Pro.

Analyzing Apple GPU performance with performance heat maps

Overview

View the performance heat maps

Switch between shader types

Display more types of performance heat maps

View and adjust the value range of performance heat maps

Inspect the execution history for a SIMD group

Understand the number of instructions per line

Switch between per-line statistics modes

See Also

Metal workload analysis