MPSLSTMDescriptor
A description of a long short-term memory block or layer.
Declaration
class MPSLSTMDescriptorOverview
The recurrent neural network (RNN) layer initialized with MPSLSTMDescriptor transforms the input data (image or matrix), the memory cell data, and previous output with a set of filters. Each produces one feature map in the output data and memory cell according to the long short-term memory (LSTM) formula detailed below.
You may provide the LSTM unit with a single input or a sequence of inputs.
Description of Operation
Let
x_jbe the input data (at time indextof sequence,jindex containing quadruplet: batch index,x,yand feature index (x = y = 0for matrices)).Let
h0_jbe the recurrent input (previous output) data from previous time step (at time indext-1of sequence).Let
h1_ibe the output data produced at this time step.Let
c0_jbe the previous memory cell data (at time indext-1of sequence).Let
c1_ibe the new memory cell data (at time indext-1of sequence).Let
Wi_ij,Ui_ij,Vi_ijbe the input gate weights for input, recurrent input, and memory cell (peephole) data, respectively.Let
bi_ibe the bias for the input gate.Let
Wf_ij,Uf_ij,Vf_ijbe the forget gate weights for input, recurrent input, and memory cell data, respectively.Let
bf_ibe the bias for the forget gate.Let
Wo_ij,Uo_ij,Vo_ijbe the output gate weights for input, recurrent input, and memory cell data, respectively.Let
bo_ibe the bias for the output gate.Let
Wc_ij,Uc_ij,Vc_ijbe the memory cell gate weights for input, recurrent input, and memory cell data, respectively.Let
bc_ibe the bias for the memory cell gate.Let
gi(x),gf(x),go(x),gc(x)be the neuron activation function for the input, forget, output gate, and memory cell gate.Let
gh(x)be the activation function applied to result memory cell data.
The output of the LSTM layer is computed as follows:
I_i = gi( Wi_ij * x_j + Ui_ij * h0_j + Vi_ij * c0_j + bi_i )
F_i = gf( Wf_ij * x_j + Uf_ij * h0_j + Vf_ij * c0_j + bf_i )
C_i = gc( Wc_ij * x_j + Uc_ij * h0_j + Vc_ij * c0_j + bc_i )
c1_i = F_i c0_i + I_i C_i
O_i = go( Wo_ij * x_j + Uo_ij * h0_j + Vo_ij * c1_j + bo_i )
h1_i = O_i gh( c1_i )The * stands for convolution (see MPSRNNImageInferenceLayer) or matrix-vector/matrix multiplication (see MPSRNNMatrixInferenceLayer).
Summation is over index j (except for the batch index), but there’s no summation over repeated index i (the output index).
Note that for validity, all intermediate images must be of same size, and all U and V matrices must be square (that is, outputFeatureChannels == inputFeatureChannels). Also, the bias terms are scalars with regard to spatial dimensions.
Topics
Instance Properties
cellGateInputWeightscellGateMemoryWeightscellGateRecurrentWeightscellToOutputNeuronParamAcellToOutputNeuronParamBcellToOutputNeuronTypeMPSCNNNeuronTypeforgetGateInputWeightsforgetGateMemoryWeightsforgetGateRecurrentWeightsinputGateInputWeightsinputGateMemoryWeightsinputGateRecurrentWeightsmemoryWeightsAreDiagonaloutputGateInputWeightsoutputGateMemoryWeightsoutputGateRecurrentWeightsMPSCNNConvolutionDataSourcecellToOutputNeuronParamC