Analyzing model runtime performance with Instruments

Diagnose model performance by capturing a trace in Instruments.

Overview

When a Core AI model runs on your device, many events happen internally that can significantly affect performance. To get useful performance information for apps with on-device models, profile your app in Instruments. This template helps you:

Profile model performance alongside the rest of your app.
Identify startup delays from models that aren’t specialized for the current hardware.
Compare model performance across CPU, GPU, and Neural Engine.
Find unnecessary delays from repeatedly loading uncached models.

Record a new trace

Start with your Xcode project open and your device connected to your Mac. Next, select your app’s scheme and a run destination, then choose Product > Profile. In the Instruments template picker, select the Core AI template and click the Choose button.

[Image]

Alternatively, open Instruments and choose the Core AI template.

The Core AI template includes the following instruments:

Core AI: Captures timing information for activity in the Core AI framework across all four event categories (Specialization, Load, Setup, and Inference).
Neural Engine: Captures activity on the Neural Engine, so you can correlate Core AI events with the hardware that runs them.
GPU: Captures and shows activity on the GPU during the trace.
Time Profiler: Profiles running threads on all cores at regular intervals for all processes.

To begin recording the trace, click the Record button at the top left of the window. In your app, perform the actions that invoke your Core AI model so the trace captures the resulting events. When you finish, click the Record button again to stop recording.

Review the trace recording

Now that you’ve recorded a trace, the Instruments timeline shows recorded data from each of the instruments in the Core AI template.

[Image]

The Core AI instrument divides model activity into multiple tracks. The top track shows all activity. Expand it to reveal a child track for each active model, and expand a model’s track to reveal a child track for each of its active functions.

Each colored band represents an event in one of the four event categories. Each category has different latency characteristics. The categories, in the order they typically appear, are:

Specialization: Runtime specialization of the model for the target device architecture. Only appears for models that aren’t specialized ahead of time. Appears in green in the timeline.
Load: Preparation of the model for loading into memory. Appears in cyan in the timeline.
Setup: Preparation of the model before each inference. Appears in magenta in the timeline.
Inference: A single, complete inference from the model. Appears in blue in the timeline.

Specialization events are often the most time-intensive operations during model runtime. Each model produces at most one Specialization event — none if the model is fully specialized for the device or already cached. You can learn more about specialization and how to optimize model performance in your app in Compiling Core AI models ahead of time and Managing model specialization and caching.

[Image]

Next, brief Load events appear in the timeline. They occur only at the start of runtime, when your app first loads the model into memory. If you see frequent Load events during runtime, check that your app doesn’t reload models repeatedly.

[Image]

Finally, brief Setup events appear in the timeline, and Inference events follow. A Setup event precedes each inference.

[Image]

Analyzing model runtime performance with Instruments

Overview

Record a new trace

Review the trace recording

See Also

Runtime monitoring and analysis