init(learningRate:beta1:beta2:timeStep:epsilon:gradientScale:weightDecay:gradientClipping:usesAMSGrad:)

Returns a new AdamW optimizer object with gradient clipped by value or clipped by norm.

Declaration

init(learningRate: Float = 0.001, beta1: Float = 0.9, beta2: Float = 0.999, timeStep: Float = 1, epsilon: Float = 1e-8, gradientScale: Float, weightDecay: Float = 1e-2, gradientClipping: BNNS.GradientClipping, usesAMSGrad: Bool = false)

Parameters

learningRate:
A value that specifies the learning rate.
beta1:
A value that specifies the first-moment constant, in the range 0 to 1.
beta2:
A value that specifies the second-moment constant, in the range 0 to 1.
timeStep:
A value that’s at least 1 and represents the optimizer’s current time.
epsilon:
The epsilon value you use to improve numerical stability.
gradientScale:
A value that specifies the gradient scaling factor.
weightDecay:
The weight decay coefficient.
gradientClipping:
The gradient clipping function and bounds.
usesAMSGrad:
A Boolean value that specifies whether the optimizer should use the AMSGrad variant.