Key Concepts

If you are new to Larq and/or Binarized Neural Networks (BNNs), this is the right place to start. Below, we summarize the key concepts you need to understand to work with BNNs.


The transformation from high-precision Neural Networks to Quantized Neural Networks (QNNs) is achieved by quantization. This is the process of mapping a large set of, often continuous, values to a smaller countable set. Binarized Neural Networks are a special case of QNNs, where the quantization output \(x_q\) is binary: \[ x_q = q(x), \quad x_q \in \{-1, +1\}, x \in \mathbb{R} \]

In larq, A quantizer \(q\) defines the way of transforming a full precision input to a quantized output and the pseudo-gradient method used for the backwards pass. The latter is called pseudo-gradient, as it is in general not the true gradient.

Generally, you will find quantizers throughout the network to quantize activations. This is because most layers output integers, even if all inputs are binary, because they sum over multiple binary values.

It is also common to apply quantizers to the weights during training. This is necessary when relying on real-valued latent-weights to accumulate non-binary update steps, a common optimization strategy for BNNs. After training is finished, the real-valued weights and associated quantization operations can be discarded.


The true gradient of a quantizer is in general zero almost everywhere and therefore cannot be used for gradient descent. Instead, the optimization of BNNs relies on what we call pseudo-gradients, which are used during back-propagation. In the documentation for each quantizer you will find the definition and a graph of the pseudo-gradient.

Quantized Layers

Each quantized layer accepts an input_quantizer and a kernel_quantizer that describe the way of quantizing the incoming activations and weights of the layer respectively. If both input_quantizer and kernel_quantizer are None the layer is equivalent to a full precision layer.

A quantized layer computes activations \(\boldsymbol{y}\) as:

\[ \boldsymbol{y} = \sigma(f(q_{\, \mathrm{kernel}}(\boldsymbol{w}), q_{\, \mathrm{input}}(\boldsymbol{x})) + b) \]

with full precision weights \(\boldsymbol{w}\), arbitrary precision input \(\boldsymbol{x}\), layer operation \(f\), activation function \(\sigma\) and bias \(b\). For a densely-connected layer \(f(\boldsymbol{w}, \boldsymbol{x}) = \boldsymbol{x}^T \boldsymbol{w}\). This computation will result in the following computational graph:

kernel bias input output input_quantizer kernel_quantizer layer_operation add activation

Larq layers are fully compatible with the Keras API so you can use them with Keras Layers interchangeably:

model = tf.keras.models.Sequential([
    larq.layers.QuantDense(512, activation="relu"),
    larq.layers.QuantDense(10, activation="softmax")
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(512, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")

Using Quantizers as Activations

Although quantizers are usually passed as specialized arguments to a Quantization layer, they can be used just like tf.keras activations:

# Use a quantizer as activation
y = larq.layers.QuantDense(512, activation="ste_sign")(x)

Just like activations, quantizers can also be used on their own:

# The two lines below are equivalent
x_binarized = larq.quantizers.SteSign(clip_value=1.0)(x)
x_binarized = tf.keras.layers.Activation("ste_sign")(x)

Using Custom Quantizers

Quantizers are functions that transform a full-precision input to a quantized output. Since this transformation usually is non-differentiable, it is necessary to modify the gradient to be able to train the resulting QNN. This can be done with the tf.custom_gradient decorator.

In this example we will define a binarization function with an identity gradient:

def identity_sign(x):
    def grad(dy):
        return dy
    return tf.sign(x), grad

This function can now be used as an input_quantizer or a kernel_quantizer: