Key Concepts

If you are new to Larq and/or Binarized Neural Networks (BNNs), this is the right place to start. Below, we summarize the key concepts you need to understand to work with BNNs.

Quantizer¶

The transformation from high-precision Neural Networks to Quantized Neural Networks (QNNs) is achieved by quantization. This is the process of mapping a large set of, often continuous, values to a smaller countable set. Binarized Neural Networks are a special case of QNNs, where the quantization output \(x_q\) is binary: \[ x_q = q(x), \quad x_q \in \{-1, +1\}, x \in \mathbb{R} \]

In larq, A quantizer \(q\) defines the way of transforming a full precision input to a quantized output and the pseudo-gradient method used for the backwards pass. The latter is called pseudo-gradient, as it is in general not the true gradient.

Generally, you will find quantizers throughout the network to quantize activations. This is because most layers output integers, even if all inputs are binary, because they sum over multiple binary values.

It is also common to apply quantizers to the weights during training. This is necessary when relying on real-valued latent-weights to accumulate non-binary update steps, a common optimization strategy for BNNs. After training is finished, the real-valued weights and associated quantization operations can be discarded.

Pseudo-Gradient¶

The true gradient of a quantizer is in general zero almost everywhere and therefore cannot be used for gradient descent. Instead, the optimization of BNNs relies on what we call pseudo-gradients, which are used during back-propagation. In the documentation for each quantizer you will find the definition and a graph of the pseudo-gradient.

Quantized Layers¶

Each quantized layer accepts an input_quantizer and a kernel_quantizer that describe the way of quantizing the incoming activations and weights of the layer respectively. If both input_quantizer and kernel_quantizer are None the layer is equivalent to a full precision layer.

A quantized layer computes activations \(\boldsymbol{y}\) as:

\[ \boldsymbol{y} = \sigma(f(q_{\, \mathrm{kernel}}(\boldsymbol{w}), q_{\, \mathrm{input}}(\boldsymbol{x})) + b) \]

with full precision weights \(\boldsymbol{w}\), arbitrary precision input \(\boldsymbol{x}\), layer operation \(f\), activation function \(\sigma\) and bias \(b\). For a densely-connected layer \(f(\boldsymbol{w}, \boldsymbol{x}) = \boldsymbol{x}^T \boldsymbol{w}\). This computation will result in the following computational graph:

Larq layers are fully compatible with the Keras API so you can use them with Keras Layers interchangeably:

Larq 32-bit model

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(),
        larq.layers.QuantDense(512, activation="relu"),
        larq.layers.QuantDense(10, activation="softmax"),
    ]
)

Keras 32-bit model

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation="relu"),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

Kernels and inputs can be quantized independently. For instance, a network which contains layers where only the kernels are quantized is referred to as a Binary Weight Network (BWN). When only the inputs are binarized, the network is referred to as a Binary Activation Network (BAN). When a network contains layers in which both the inputs as well as the kernels are binarized the network is referred to as a Binary Neural Network (BNN).

Binary Weight Network (BWN)

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(),
        larq.layers.QuantDense(
            512, kernel_quantizer="ste_sign", kernel_constraint="weight_clip"
        ),
        larq.layers.QuantDense(
            10,
            input_quantizer=None,
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            activation="softmax",
        ),
    ]
)

Binary Activation Network (BAN)

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(),
        larq.layers.QuantDense(512, kernel_quantizer=None, kernel_constraint=None),
        larq.layers.QuantDense(
            10,
            input_quantizer="ste_sign",
            kernel_quantizer=None,
            kernel_constraint=None,
            activation="softmax",
        ),
    ]
)

Binary Neural Network (BNN)

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(),
        larq.layers.QuantDense(
            512, kernel_quantizer="ste_sign", kernel_constraint="weight_clip"
        ),
        larq.layers.QuantDense(
            10,
            input_quantizer="ste_sign",
            kernel_quantizer="ste_sign",
            kernel_constraint="weight_clip",
            activation="softmax",
        ),
    ]
)

Using Custom Quantizers¶

Quantizers are functions or Keras layers that transform a full-precision input to a quantized output. Since this transformation is usually non-differentiable, it is necessary to modify the gradient to be able to train the resulting QNN. This can be done with the tf.custom_gradient decorator.

In this example we will define a binarization function with an identity gradient:

@tf.custom_gradient
def identity_sign(x):
    def grad(dy):
        return dy

    return tf.sign(x), grad

This function can now be used as an input_quantizer or a kernel_quantizer:

larq.layers.QuantDense(
    10, input_quantizer=identity_sign, kernel_quantizer=identity_sign
)