Gaining performance insights with the Metal Performance HUD
Catch potential performance issues while your app runs using the Metal heads-up display.
Overview
To help you optimize performance, the Metal Performance HUD analyzes your app’s Metal API call patterns and automatically flags potential issues. This analysis is available for both native apps and Windows games running through the evaluation environment. Each insight offers links to documentation detailing the issue and outlining methods for resolution.
You can also generate a performance report covering a specific duration to gain deeper insight into your app’s performance during that timeframe. For more information, see Generating performance reports with the Metal Performance HUD
Turn on performance insights by setting the MTL_HUD_INSIGHTS_ENABLED environment variable to 1, or through the configuration panel on macOS. See Customizing the Metal Performance HUD for more details.
export MTL_HUD_INSIGHTS_ENABLED=1[Image]
Analyze and interpret performance insights
When performance insights are active, the Metal Performance HUD collects Metal API statistics for every frame. If a pattern suggesting a potential performance issue on Apple GPUs occurs in at least half the frames over a set duration (the default time is 5 seconds), an insight overlay appears alongside the main overlay.
[Image]
The HUD identifies four major issue types for native apps and specific D3D12 API usage issues from the Game Porting Toolkit, discussed below.
An excessive number of encoders per frame can be a performance bottleneck on Apple GPUs. The Metal Performance HUD assists in optimizing performance by examining sequential render command encoders to find those with similar color attachments. These encoders are prime candidates for merging by adopting the color attachment mapping feature in Metal 4. Color attachment mapping allows you to define the relationship between logical and physical color attachments for draw operations, thereby allowing a single encoder to utilize a diverse set of color attachments. To learn more, see Understanding the Metal 4 core API.
When the HUD detects this insight, it includes a complete encoding table in the performance report to help you locate these specific encoders. To learn more, see Generating performance reports with the Metal Performance HUD.
[Image]
Frequent encoder switching due to resource copies with blit command encoders is another area for potential optimization. This pattern splits the current render or compute encoder into multiple encoders. You can mitigate this overhead by batching resource updates or by adopting Metal 4 command encoding, where blit commands are part of compute command encoders.
Serial, blocking shader compilation during runtime often causes stutters. Metal 4 mitigates this with new shader compilation methods that allow for finer-grained control and increased parallelization.
When the app spends most of the frame interval encoding GPU work, there can be a CPU bottleneck. With Metal 4, you can have more explicit control of command encoding and improve the CPU performance when encoding. See Understanding the Metal 4 core API for more information.
In D3D12, resource barriers are often too coarse, which can lead to oversynchronization when running applications through the Game Porting Toolkit evaluation environment. When porting to Metal, you can use more fine-grained synchronization primitives such as MTLFence and MTLEvent to improve performance.
Apple GPU doesn’t support tessellation and geometry stages directly and needs to be emulated. You can adopt mesh shaders as an alternative to improve performance.
See Also
Runtime diagnostics
Inspecting live resources at runtimeValidating your app’s Metal API usageValidating your app’s Metal shader usageMonitoring your Metal app’s graphics performanceCustomizing the Metal Performance HUDUnderstanding the Metal Performance HUD metricsGenerating performance reports with the Metal Performance HUD