Augmenting images to expand your training data

Improve your model by using transformed versions of your training images.

Overview

Training a good image model requires a variety of training images with different characteristics. If you’re training an image classifier to recognize flowers you can improve classification accuracy by providing flower photos with a variety of lighting conditions, angles, and backgrounds. However, collecting and labeling images is a time-consuming process.

To maximize the potential impact of your data you can use image augmentations. Augmenting images is the process of applying transformations such as flipping, cropping, resizing, adjusting brightness, adding noise, and so on. Image augmentations are not a replacement for a good image data set, but they help maximize the data set’s effectiveness. Each augmentation has the potential to multiply the size of your training data which is helpful when your training sample size is small.

[Image]

Keep in mind that performing augmentations has some drawbacks. You can’t reuse extracted features across training iterations because each iteration produces a new set of augmented images. This increase in training time can be significant. Augmentations have the most impact when your dataset is small relative to the number of parameters in your model.

Write an augmentation

You can use Augmenter to create a custom augmentation. The augmenter applies each transformer to each image in sequence. For example, this augmenter randomly flips (with 50% probability) and then randomly crops images:

let augmenter = Augmenter {
    ApplyRandomly(probability: 0.5) {
        ImageFlipper(orientation: .horizontal)
    }
    RandomImageCropper(scale: 0.8 ..< 0.9, aspectRatio: 1.0)
}

To generate a random number each time, use UniformRandomFloatingPointParameter. You can use the random number with any transformer. For instance, to perform a slight rotation using ImageRotator:

let augmenter = Augmenter {
    ApplyRandomly(probability: 0.5) {
        ImageFlipper(orientation: .horizontal)
    }
    UniformRandomFloatingPointParameter(range: -10.0 ... 10.0) { angle in
        ImageRotator(angle: angle * .pi / 180)
    }
    RandomImageCropper(scale: 0.8 ..< 0.9, aspectRatio: 1.0)
}

Apply augmentations to your training data

After you create an augmenter, you can use it to augment your training data. You do this with the applied(to:) method:

let augmentedTrainingData = augmenter.applied(to: trainingData)

Because augmentations are usually random, it makes sense to do more than one pass over your training data. Each pass results in a different set of images, for instance with different scale factors. You can control the number of passes by using the applied(to:upsampledBy:) method. For instance, to get 10 times the number of images:

let augmentedTrainingData = augmenter.applied(
   to: trainingData,
   upsampledBy: 10
)

Train an image classifier

You can now use your augmented data set to train an image classifier. Because augmentations take images, not URLs, you may need to read your files first.

let reader = ImageReader().adaptedAsAnnotatedFeatureTransformer(annotationType: String.self)
let trainingImages = try await reader.applied(to: trainingFiles)
let validationImages = try await reader.applied(to: validationFiles)

Now that you have annotated images, the next step is to create your pipeline. For an image classifier, create a feature extractor and a classifier.

let featurePrint = ImageFeaturePrint(revision: 2)
let classifier = FullyConnectedNetworkClassifier<Float, String>(labels: labels, configuration: configuration)
let task = featurePrint.appending(classifier)

Next, create the augmenter that transforms the training images.

let augmenter = Augmenter {
    ApplyRandomly(probability: 0.5) {
        ImageFlipper(orientation: .horizontal)
    }
    UniformRandomFloatingPointParameter(range: -10.0 ... 10.0) { angle in
        ImageRotator(angle: angle * .pi / 180)
    }
    RandomImageCropper(scale: 0.8 ..< 0.9, aspectRatio: 1.0)
}

Finally, create a model, apply the augmenter to the training data, and progressively train the model using update(_:with:eventHandler:). The following example applies the augmenter to the trainingImages and updates the model up to 100 times:

var model = task.makeTransformer()
for iteration in 0 ..< 100 {
    // Perform one training iteration.
    let augmentedData = augmenter.applied(to: trainingImages.shuffled())
    for try await batch in augmentedData.batches(ofSize: 32, dropsLastPartialBatch: false) {
        try await task.update(&model, with: batch)
    }

    // Compute accuracy on validation data, stop when done.
    // ...
}

The example above applies the augmenter to shuffled images. Shuffling images creates more variation, which helps prevent over-fitting. The example breaks the data into batches after each augmentation, and updates the model with each batch. Using a smaller batch size typically produces a better model, while using a larger batch size can speed up training.

Stop training

Training your model progressively using the update(_:with:eventHandler:) method lets you control when to stop training. Stop training when the validation accuracy stops improving, for example:

var accuracies = [Double](repeating: 0, count 5)
for iteration in 0 ..< 100 {
    // Perform one training iteration.
    // ...

    // Compute the accuracy on the validation data.
    let predictions = try await model.prediction(from: validationImages)
    let validationMetrics = ClassificationMetrics(
        predictions.map(\.prediction.mostLikelyLabel),
        predictions.map(\.annotation)
    )
    let validationAccuracy = validationMetrics.accuracy

    // Save the accuracy in a circular buffer.
    metrics[iteration % accuracies.count] = validationAccuracy

    // Stop when there has been no significant improvement in the last 5 iterations.
    if accuracies.allSatisfy({ $0 >= validationAccuracy - 0.01 }) {
        break
    }
}