Managing model specialization and caching

Configure model specialization, manage cached assets, and reduce your app’s storage footprint.

Overview

When you load a .aimodel file with AIModel, Core AI performs specialization, the process of optimizing the model for the current device’s hardware. The .aimodel file contains your model in a portable format that works across Apple devices. Before the model can run, Core AI specializes it for the current device, producing executable code tied to that device’s hardware and OS version.

By default, an AIModel automatically specializes the model and caches the result. On the first call, Core AI specializes the model and stores the output. On subsequent calls with the same model and options, Core AI loads the cached version rather than running the specialization process again, which reduces load times.

Core AI provides APIs to control how, when, and where specialization happens.

Check for a cached specialization

To avoid re-specializing your model whenever your app launches, check if a cached version already exists. Call model(for:options:) on the app’s default AI model cache. The options parameter specifies which SpecializationOptions to match against:

func loadModel(from modelURL: URL) async throws -> AIModel {
    // The default cache stores all specialized assets for your app bundle.
    let cache = AIModelCache.default

    // A non-`nil` result means the model was previously specialized and cached.
    if let model = try cache.model(for: modelURL, options: .default) {
        return model
    }

    // No cached specialization exists. Inform the person and specialize now.
    Task { @MainActor in
        informUser("Preparing AI features. This may take a while…")
    }

    // This call performs specialization, caches the result, and returns the model.
    return try await AIModel(contentsOf: modelURL, options: .default)
}

This method checks whether a cached specialization exists for the given model and options; however, it doesn’t perform specialization. If a cached version exists, the method returns the model instantly. If it returns nil, no cached specialization exists.

Choose how Core AI specializes your model

Use SpecializationOptions to configure how Core AI specializes your model. By default, the system selects the combination of compute units (CPU, GPU, and Neural Engine) to minimize inference latency:

let model = try await AIModel(contentsOf: modelURL, options: .default)

For advanced use cases, restrict specialization to CPU only with .cpuOnly, or prefer a specific compute unit with init(preferredComputeUnitKind:). For example, if your app runs a small model in the background, use .cpuOnly to avoid competing with foreground GPU work.

In most scenarios, the default configuration offers the best performance, so test your app’s performance carefully before overriding it. Because not all devices have the same compute units available, check what’s available with availableKinds. For details on all available specialization options, see SpecializationOptions.

Specialize a model before loading it

When your app downloads a model or enables a feature that uses one, you can specialize the model at a convenient moment so the person doesn’t notice a delay when they use it. Use specialize(contentsOf:options:cache:cachePolicy:) to specialize a model without loading it for inference:

guard let localModelURL = try await downloadModel(forFeature: feature) else {
    throw AppError.failedToDownloadModel(feature)
}

// Specialize the model so it's ready before the person needs it.
try await AIModel.specialize(contentsOf: localModelURL, options: .default)

// The model is now specialized and cached. Future loads skip specialization.
let model = try await AIModel(contentsOf: localModelURL, options: .default)

This method stores the specialized assets in default and returns the specialized AIModel. After explicit specialization, any future AIModel initialization with the same model URL and options loads directly from cache.

The specialize method differs from ahead-of-time compilation. With ahead-of-time compilation, most of the heavy computation happens on your Mac at build time, so on-device specialization finishes faster. With specialize, the full specialization process runs on the person’s device. You are controlling when specialization happens, not reducing the work it does.

Control cache persistence with policies

Because the system can automatically delete specialized assets to free up storage, use AIModelCache.Policy to control whether the system can remove your app’s cached assets.

The system can remove specialized assets from the cache under three conditions:

OS update: Specialized assets are tied to the OS version. The system always invalidates assets on OS update, regardless of policy.
Source model change: If the source .aimodel file is modified or deleted, cached assets derived from it become invalid.
Storage pressure: The system can reclaim space by deleting assets marked as purgeable.

For most apps, use the default policy. It allows the system to reclaim storage when needed by deleting assets under both storage pressure and source model changes.

If your app deletes the source model file to save storage, use the .persistent policy to keep the cached assets available across launches:

try await AIModel.specialize(
    contentsOf: modelURL,
    options: .default,
    cachePolicy: .persistent
)

Delete cached assets you no longer need

To reduce your app’s storage footprint, delete cached assets when they’re no longer needed. For example, when your app downloads an updated version of a model and the previous version’s cached assets are no longer valid:

func downloadAndUpdateModel(from remoteURL: URL, localModelURL: URL) async throws {
    let tempURL = try await downloadLatestModel(from: remoteURL)

    // Delete cached assets for the old model.
    let cache = AIModelCache.default
    try cache.deleteEntries(for: localModelURL)

    // Replace the old model with the new one.
    try FileManager.default.replaceItemAt(localModelURL, withItemAt: tempURL)

    // Specialize the updated model.
    try await AIModel.specialize(
        contentsOf: localModelURL,
        options: .default,
        cachePolicy: .persistent
    )
}

Core AI provides methods for deleting cached assets:

deleteEntries(for:): Ignores any SpecializationOptions and deletes all cache entries for a specific .aimodel.
deleteEntry(for:options:): Deletes a single cache entry for a specific .aimodel and SpecializationOptions combination.
deleteAll(): Deletes all entries in the entire cache.

If an AIModel instance still uses a cache entry, Core AI defers deletion until that instance is deallocated.

If you have multiple apps or extensions that use the same model, create an app group using the App Groups Entitlement. Then use init(appGroup:) to target the group identifier and load a shared cache. This avoids duplicating specializations across apps:

// Get the app group cache.
guard let groupCache = AIModelCache(appGroup: groupIdentifier) else {
    fatalError("Invalid group identifier or entitlement.")
    return
}

// Specialize into the shared cache.
try await AIModel.specialize(
    contentsOf: sharedModelURL,
    options: .default,
    cache: groupCache,
    cachePolicy: .persistent
)

Other apps in the same group can then load the model from the shared cache:

guard let groupCache = AIModelCache(appGroup: groupIdentifier) else {
    return
}

if let model = try groupCache.model(for: sharedModelURL, options: .default) {
    // Use the model. No specialization needed.
}

Delete the source model and load from cache

The unspecialized .aimodel file, along with the SpecializationOptions you pass, is what Core AI uses to index and retrieve the cached specialization at runtime when you call init(contentsOf:options:) or model(for:options:). Because of this, you can’t simply delete the source file and expect those APIs to keep working. Instead, save a bookmark to the cached specialization and load the model directly from that bookmark on later launches.

After specializing a model, capture its bookmarkData and save it somewhere your app can read on later launches, such as UserDefaults:

// Specialize and keep a reference to the model.
let model = try await AIModel.specialize(
    contentsOf: llmURL,
    options: .default,
    cachePolicy: .persistent
)

// Save bookmark data to restore access after the app exits.
let bookmarkData = model.bookmarkData
UserDefaults.standard.set(bookmarkData, forKey: "llm.bookmark")

On a subsequent launch, resolve the bookmark to load the model directly from the cache, without going through the source file:

if let bookmarkData = UserDefaults.standard.data(forKey: "llm.bookmark") {
    do {
        if let model = try AIModel(resolvingBookmark: bookmarkData) {
            // Use the model.
            return model
        }
        // The model can't be found or was invalidated by an OS update.
    } catch {
        // The bookmark data is invalid.
    }
}

// Download and specialize the model again.

With the bookmark saved, your app can delete the source .aimodel file to reclaim storage and continue working with the cached specialization through the bookmark:

// Delete the source model to reclaim storage.
try FileManager.default.removeItem(at: llmURL)

Bookmark data doesn’t prevent removing assets from the device. If the system purges the assets, you manually delete them, or an OS update invalidates them, your app can’t resolve the bookmark and needs to download and specialize the model again.

Managing model specialization and caching

Overview

Check for a cached specialization

Choose how Core AI specializes your model

Specialize a model before loading it

Control cache persistence with policies

Delete cached assets you no longer need

Delete the source model and load from cache

See Also

Configuration

Managing model specialization and caching

Overview

Check for a cached specialization

Choose how Core AI specializes your model

Specialize a model before loading it

Control cache persistence with policies

Delete cached assets you no longer need

Share specialized models across apps

Delete the source model and load from cache

See Also

Configuration