Synchronizing stages within a pass

Block GPU stages in the a pass from running until other stages in the same pass finish.

Overview

An intrapass barrier resolves access conflicts between commands within the same pass, without affecting any other passes. When your app encodes commands that access a resource from different passes — or different stages within a single pass — it creates an access conflict when at least one command modifies that resource. This conflict happens because the GPU can run multiple commands at the same time, including those from:

Multiple passes
Different stages of a pass, such as the blit and dispatch stages of a compute pass
Multiple instances of a stage, such as two or more dispatch commands within a compute pass

For more information about resource access conflicts and GPU stages, see Resource synchronization and MTLStages, respectively.

Start by identifying which memory operations from different stages within a pass introduce a conflict. Then resolve the conflict by adding an intrapass barrier to pause the GPU before running the consuming stage until it finishes running the producing stage.

Identify access conflicts within a single pass

The following code example encodes a compute pass that has an access conflict between its copy and dispatch commands.

The example has at least one access conflict because the pass accesses two common resources — bufferA and bufferB — from different stages, and at least one command modifies one or more of those resources.

The copy command and the dispatch commands run during the blit and dispatch stages, respectively; both commands modify bufferB.

[Image]

Without a barrier, the GPU can run the commands at any time relative to each other, including at the same time, which can yield inconsistent results in resources with access conflicts.

[Image]

Resolve an intrapass conflict with a barrier

Resolve access conflicts between commands within the same pass by adding an intrapass barrier with the encoder’s barrier(afterEncoderStages:beforeEncoderStages:visibilityOptions:) method.

The following code example modifies the previous one adding an intrapass barrier between the blit and dispatch stages within the pass.

The code example adds a barrier between the blit and dispatch stages because they both access bufferB with load or store operations. The barrier forces the GPU to wait until the blit command completes before starting the dispatch stage.

[Image]

The barrier makes it so that the store operations from the blit stage’s commands finish completely before the dispatch stage’s commands load from the same memory.

Encode commands that rely on fragment or tile stage outputs

Metal doesn’t support intrapass barriers that wait for the tile or fragment stages on devices that have a tile-based deferred rendering (TBDR) architecture, such as Apple silicon GPUs.

You can encode a tile dispatch that depends on the results of a previous tile dispatch because tile compute dispatches can access data from anywhere within the same tile. Similarly, you can encode a draw command that depends on the results of a previous draw command’s fragment stage because fragment shaders can only access data at their specific pixel location. However, if a tile dispatch needs results from another tile, or a fragment shader needs results from another fragment, then start a new render pass and synchronize them with a barrier.

For example, to synchronize the two passes by adding a consumer-based queue barrier in the new pass:

End the current render pass by calling the encoder’s endEncoding() method.
Start a new render pass by creating a new render encoder from the command buffer, or another bound for the queue.
Add a consumer barrier by calling the new encoder’s barrier(afterQueueStages:beforeStages:visibilityOptions:) method, which synchronizes the results of the previous render pass.

Similarly, to create a producer-based queue barrier in a pass:

Add a producer barrier by calling the encoder’s barrier(afterStages:beforeQueueStages:visibilityOptions:) method to synchronize the results of the current render pass.
End the current render pass by calling the encoder’s endEncoding() method.
Start a new render pass by creating a new encoder from the command buffer, or another bound for the queue.

Alternatively, use an MTLFence:

Update a fence in the current render pass by calling the encoder’s updateFence(_:afterEncoderStages:) method.
End the current render pass by calling the encoder’s endEncoding() method.
Start a new render pass by creating a new encoder from the same command buffer.
Wait for the same fence instance in the new render pass by calling the new encoder’s waitForFence(_:beforeEncoderStages:) method.

For more information about other synchronization mechanisms, see these articles in the series:

Synchronizing stages within a pass

Overview

Identify access conflicts within a single pass

Resolve an intrapass conflict with a barrier

Encode commands that rely on fragment or tile stage outputs

See Also

Synchronizing with barriers and fences