Contents

Validating inference correctness against a reference run

Measure numerical divergence in a Core AI model against a reference run.

Overview

Quantization and model specialization can introduce numerical drift between a Core AI model and the original source model. Core AI Debugger pairs each operation in your Core AI asset with its counterpart in a reference run, then automatically measures similarity for every matched pair.

[Image]

Prepare a reference run

An .aimodelintermediates file records the intermediate tensor values produced at each operation of a PyTorch reference run. To generate the file, use the save_intermediates API, passing both the model you want to validate and the original source model. The result is a per-operation mapping between the PyTorch run and the Core AI model that Core AI Debugger can use to compare inference results.

Start a comparison session

To compare your Core AI model against an .aimodelintermediates file:

  1. Open your .aimodel file in Core AI Debugger.

  2. In the toolbar, click the Comparison button to start a comparison session.

  3. Under Configuration A, set the Target, Function, Compute Unit, and Graph Visualization, and specify your model inputs.

  4. Under Configuration B, click the Target menu and select Intermediates File under Load Reference Run.

  5. Click the folder icon and select your .aimodelintermediates file.

  6. Click Compare.

[Image]

Read comparison results in the Navigator

When a comparison session starts, the Navigator populates with sync points — operation pairs that combine a Core AI operation with its PyTorch counterpart. Each sync point shows both operation names alongside a similarity score and a color-coded indicator dot:

  • Green: close match

  • Yellow: moderate divergence

  • Red: large error

[Image]

Sort by Similarity to identify the most divergent pairs, or by Operation to see whether failures cluster in a specific part of the model. Click any sync point to see that operation in the Structure Viewer, Source Viewer, and Inspector.

Review comparison metrics

Core AI Debugger reports five metrics for each sync point. Color indicators are metric-aware, so green always signals a good result regardless of which metric you choose.

The default metric is PSNR. The other metrics offer different lenses depending on what kind of divergence you want to surface:

PSNR

The ratio of the reference tensor’s peak output value to the mean squared error, expressed in decibels. A good general-purpose choice that works well for most models and tensor types.

Mean Absolute Error (MAE)

The average absolute difference across all elements. Use this to understand overall deviation without sensitivity to outliers.

Mean Squared Error (MSE)

The average squared difference, which amplifies larger errors. Useful when large deviations are more consequential than small ones.

Max Absolute Error

The single largest per-element difference. A high value can expose clipping or overflow even when MAE looks acceptable.

Mean Relative Error

The average difference as a proportion of the expected value at each element. Useful when tensor magnitudes vary widely across operations.

Investigate a divergent operation

Select a sync point with a low similarity score to begin investigating. In the Inspector, the tensor outputs from both runs are displayed side by side alongside a visual difference, letting you see directly where the values diverge.

[Image]

Use the Source Viewer to trace the operation back to its origin in the PyTorch code. The module hierarchy at the top of the Source Viewer tells you which PyTorch module the operation belongs to. If low-similarity sync points cluster in the same module, the divergence is localized there, giving you a precise target for changes to your model. If only specific operations diverge, use the Source Viewer to understand their implementation and identify what may be causing the discrepancy.

[Image]

See Also

Model inspection and validation