Sampling GPU data into counter sample buffers
Retrieve a GPU’s counter data at a time the GPU supports.
Overview
You can sample a GPU device’s performance counter data at different times, including:
At pipeline stage boundaries
Between different Metal commands
Typically, a GPU supports one of these boundary types. For example, Apple silicon supports sampling at the stage boundary because it processes fragments after processing every primitive for a render pass. However, a typical immediate-mode GPU supports sampling between commands.
Before you can sample a GPU counter, implement the following prerequisite steps:
Identify which counters you can sample from an MTLDevice instance (see Confirming which counters and counter sets a GPU supports).
Make an MTLCounterSampleBuffer instance to store the counter’s data (see Creating a counter sample buffer to store a GPU’s counter data during a pass).
The sections below explain how to identify when you can sample a GPU’s counters, and how to encode commands to retrieve their data.
Each GPU vendor defines its own private data format for its counter sample buffers, which means your app can’t read the contents of each buffer directly. Instead, your app can transform the data from the vendor’s internal format to the standard Metal formats by resolving each sample buffer. See Converting a GPU’s counter data into a readable format for the next steps that resolve the data within a counter sample buffer.
Check which boundaries a GPU supports
You can inspect a GPU device instance to see whether it supports a specific sample boundary by calling its supportsCounterSampling(_:) method with each MTLCounterSamplingPoint case.
This method checks for multiple sample boundaries and returns those the GPU supports in an array.
Sample counters at stage boundaries
For a GPU device that can sample counters at stage boundaries ( MTLCounterSamplingPoint.atStageBoundary), you can sample its counters between the stages of a pass. When the GPU starts or finishes a stage, it samples the counters and copies the results into a counter sample buffer.
You tell the GPU which counters to sample by configuring a pass descriptor’s sampleBufferAttachments property. For example, you can sample the timestamp counters before and after the vertex and fragment stages by configuring an MTLRenderPassDescriptor instance’s sampleBufferAttachments property.
Each index value tells the GPU where to put a specific counter value within a counter sample buffer. You can skip specific counters by setting an index to MTLCounterDontSample. For example, you can alter the code example above so that the GPU only samples before and after a fragment stage.
...
sampleAttachment.sampleBuffer = self.counterSampleBuffer;
sampleAttachment.startOfVertexSampleIndex = MTLCounterDontSample;
sampleAttachment.endOfVertexSampleIndex = MTLCounterDontSample;
sampleAttachment.startOfFragmentSampleIndex = 2;
sampleAttachment.endOfFragmentSampleIndex = 3;
}This example still stores the fragment counter data in the third and fourth positions within the counter sample buffer (at indexes 2 and 3, respectively). However, it doesn’t sample the vertex stage timestamps, which leaves that part of the counter sample buffer unaltered.
Each type of pass has different boundary types and corresponding properties in their descriptor types.
Pass descriptor type | Attachment type | Attachment descriptor properties |
|---|---|---|
Samplebuffer [Image] Startofvertexsampleindex [Image] Endofvertexsampleindex [Image] Startoffragmentsampleindex [Image] Endoffragmentsampleindex | ||
Mtlaccelerationstructurepasssamplebufferattachmentdescriptor | Samplebuffer [Image] Startofencodersampleindex [Image] Endofencodersampleindex | |
Samplebuffer [Image] Startofencodersampleindex [Image] Endofencodersampleindex | ||
Samplebuffer [Image] Startofencodersampleindex [Image] Endofencodersampleindex | ||
Samplebuffer [Image] Startofencodersampleindex [Image] Endofencodersampleindex |
Sample counters at command boundaries
You can encode specific commands to sample a counter’s data during a pass for a GPU that supports any of the following boundaries:
You do this by calling an encoder’s sampleCounters(sampleBuffer:sampleIndex:barrier:) method.
The code example above encodes a sample command between two draw commands. When the GPU reaches the sample command, it samples the counters and copies the results into a counter sample buffer.
Each pass encoder type has its own version of the method.
Pass encoder type | Sample method |
|---|---|
The barrier parameter for these methods controls whether the pass waits for the GPU to complete all the previous commands in the buffer before sampling the counters (see Resource synchronization). Each barrier typically reduces performance, but can be useful during development to get accurate and consistent data across multiple runs.