MPSGRUDescriptor

A description of a gated recurrent unit block or layer.

Declaration

class MPSGRUDescriptor

Overview

The recurrent neural network (RNN) layer initialized with a MPSGRUDescriptor transforms the input data (image or matrix) and previous output with a set of filters. Each produces one feature map in the output data according to the gated recurrent unit (GRU) unit formula detailed below.

You may provide the GRU unit with a single input or a sequence of inputs. The layer also supports p-norm gating.

Description of Operation

Let x_j be the input data (at time index t of sequence, j index containing quadruplet: batch index, x,y and feature index (x = y = 0 for matrices)).
Let h0_j be the recurrent input (previous output) data from previous time step (at time index t-1 of sequence).
Let h_i be the proposed new output.
Let h1_i be the output data produced at this time step.
Let Wz_ij, Uz_ij be the input gate weights for input and recurrent input data, respectively.
Let bi_i be the bias for the input gate.
Let Wr_ij, Ur_ij be the recurrent gate weights for input and recurrent input data, respectively.
Let br_i be the bias for the recurrent gate.
Let Wh_ij, Uh_ij, Vh_ij be the output gate weights for input, recurrent gate, and input gate, respectively.
Let bh_i be the bias for the output gate.
Let gz(x``), gr(x), gh(x) be the neuron activation function for the input, recurrent, and output gates.
Let p > 0 be a scalar variable (typical p >= 1.0) that defines the p-norm gating norm value.

The output of the GRU layer is computed as follows:

z_i = gz(  Wz_ij * x_j  +  Uz_ij * h0_j  +  bz_i  )
r_i = gr(  Wr_ij * x_j  +  Ur_ij * h0_j  +  br_i  )
c_i =      Uh_ij * (r_j h0_j)  +  Vh_ij * (z_j h0_j)
h_i = gh(  Wh_ij * x_j  + c_i + bh_i  )

h1_i = ( 1 - z_i ^ p)^(1/p) h0_i + z_i h_i

The * stands for convolution (see MPSRNNImageInferenceLayer) or matrix-vector/matrix multiplication (see MPSRNNMatrixInferenceLayer).

Summation is over index j (except for the batch index), but there’s no summation over repeated index i, the output index.

Note that for validity, all intermediate images must be of same size, and all U and V matrices must be square (that is, outputFeatureChannels == inputFeatureChannels). Also, the bias terms are scalars with regard to spatial dimensions. The conventional GRU block is achieved by setting Vh = 0 (nil), and the Minimal Gated Unit is achieved with Uh = 0.

Topics

Instance Properties

Type Methods

createGRUDescriptor(withInputFeatureChannels:outputFeatureChannels:)

MPSGRUDescriptor

Declaration

Overview

Description of Operation

Topics

Instance Properties

Type Methods

Relationships

Inherits From

Conforms To

See Also

Recurrent Neural Networks