MPSGRUDescriptor
A description of a gated recurrent unit block or layer.
Declaration
class MPSGRUDescriptorOverview
The recurrent neural network (RNN) layer initialized with a MPSGRUDescriptor transforms the input data (image or matrix) and previous output with a set of filters. Each produces one feature map in the output data according to the gated recurrent unit (GRU) unit formula detailed below.
You may provide the GRU unit with a single input or a sequence of inputs. The layer also supports p-norm gating.
Description of Operation
Let
x_jbe the input data (at time indextof sequence,jindex containing quadruplet: batch index,x,yand feature index (x = y = 0for matrices)).Let
h0_j be the recurrent input (previous output) data from previous time step (at time indext-1of sequence).Let
h_ibe the proposed new output.Let
h1_ibe the output data produced at this time step.Let
Wz_ij,Uz_ijbe the input gate weights for input and recurrent input data, respectively.Let
bi_ibe the bias for the input gate.Let
Wr_ij,Ur_ijbe the recurrent gate weights for input and recurrent input data, respectively.Let
br_ibe the bias for the recurrent gate.Let
Wh_ij,Uh_ij,Vh_ijbe the output gate weights for input, recurrent gate, and input gate, respectively.Let
bh_ibe the bias for the output gate.Let
gz(x``),gr(x),gh(x)be the neuron activation function for the input, recurrent, and output gates.Let
p > 0be a scalar variable (typicalp >= 1.0) that defines the p-norm gating norm value.
The output of the GRU layer is computed as follows:
z_i = gz( Wz_ij * x_j + Uz_ij * h0_j + bz_i )
r_i = gr( Wr_ij * x_j + Ur_ij * h0_j + br_i )
c_i = Uh_ij * (r_j h0_j) + Vh_ij * (z_j h0_j)
h_i = gh( Wh_ij * x_j + c_i + bh_i )
h1_i = ( 1 - z_i ^ p)^(1/p) h0_i + z_i h_iThe * stands for convolution (see MPSRNNImageInferenceLayer) or matrix-vector/matrix multiplication (see MPSRNNMatrixInferenceLayer).
Summation is over index j (except for the batch index), but there’s no summation over repeated index i, the output index.
Note that for validity, all intermediate images must be of same size, and all U and V matrices must be square (that is, outputFeatureChannels == inputFeatureChannels). Also, the bias terms are scalars with regard to spatial dimensions. The conventional GRU block is achieved by setting Vh = 0 (nil), and the Minimal Gated Unit is achieved with Uh = 0.
Topics
Instance Properties
flipOutputGatesgatePnormValueinputGateInputWeightsinputGateRecurrentWeightsoutputGateInputGateWeightsoutputGateInputWeightsoutputGateRecurrentWeightsrecurrentGateInputWeightsrecurrentGateRecurrentWeightsMPSCNNConvolutionDataSource