---
title: "adam(learningRate:beta1:beta2:epsilon:beta1Power:beta2Power:values:momentum:velocity:maximumVelocity:gradient:name:)"
framework: metalperformanceshadersgraph
role: symbol
role_heading: Instance Method
path: "metalperformanceshadersgraph/mpsgraph/adam(learningrate:beta1:beta2:epsilon:beta1power:beta2power:values:momentum:velocity:maximumvelocity:gradient:name:)"
---

# adam(learningRate:beta1:beta2:epsilon:beta1Power:beta2Power:values:momentum:velocity:maximumVelocity:gradient:name:)

Creates operations to apply Adam optimization.

## Declaration

```swift
func adam(learningRate learningRateTensor: MPSGraphTensor, beta1 beta1Tensor: MPSGraphTensor, beta2 beta2Tensor: MPSGraphTensor, epsilon epsilonTensor: MPSGraphTensor, beta1Power beta1PowerTensor: MPSGraphTensor, beta2Power beta2PowerTensor: MPSGraphTensor, values valuesTensor: MPSGraphTensor, momentum momentumTensor: MPSGraphTensor, velocity velocityTensor: MPSGraphTensor, maximumVelocity maximumVelocityTensor: MPSGraphTensor?, gradient gradientTensor: MPSGraphTensor, name: String?) -> [MPSGraphTensor]
```

## Parameters

- `learningRateTensor`: Scalar tensor which indicates the learning rate to use with the optimizer
- `beta1Tensor`: beta1Tensor
- `beta2Tensor`: beta2Tensor
- `beta1PowerTensor`: beta1^t beta1 power tensor
- `beta2PowerTensor`: beta2^t beta2 power tensor
- `valuesTensor`: Values to update with optimization
- `momentumTensor`: Momentum tensor
- `velocityTensor`: Velocity tensor
- `maximumVelocityTensor`: Optional maximum velocity tensor
- `gradientTensor`: Partial gradient of the trainable parameters with respect to loss
- `name`: Name for the operation

## Return Value

Return Value If maximumVelocity is nil array of 3 tensors (update, newMomentum, newVelocity) else array of 4 tensors (update, newMomentum, newVelocity, newMaximumVelocity)

## Discussion

Discussion The adam update ops are added current learning rate: lr[t] = learningRate * sqrt(1 - beta2^t) / (1 - beta1^t) m[t] = beta1 * m[t-1] + (1 - beta1) * g v[t] = beta2 * v[t-1] + (1 - beta2) * (g ^ 2) maxVel[t] = max(maxVel[t-1], v[t]) variable = variable - lr[t] * m[t] / (sqrt(maxVel) + epsilon)