Skip to content

larq.optimizers

Neural networks with extremely low-precision weights and activations, such as Binarized Neural Networks (BNNs), usually contain a mix of low-precision weights (e.g. 1-bit) and higher-precision weights (e.g. 8-bit, 16-bit, or 32-bit). Examples of this include the first and last layers of image classificiation models, which have higher-precision weights in most BNN architectures from the literature.

Training a BNN, then, consists of optimizing both low-precision and higher-precision weights. In larq, we provide a mechanism to target different bit-precision variables with different optimizers using the CaseOptimizer class. Modeled after the tf.case signature, CaseOptimizer accepts pairs of predicates and optimizers. A predicate, given a variable, decides whether its optimizer should train that variable.

A CaseOptimizer behaves much like any other Keras optimizer, and once you instantiate it you can pass it to your model.compile() as usual. To instantiate a CaseOptimzer, pass one or a list of (predicate, optimizer) tuples, along with a default optimizer which trains any variables not claimed by another optimizer. A variable may not be claimed by more than one optimizer's predicate.

Example

no_op_quantizer = lq.quantizers.NoOpQuantizer(precision=1)
layer = lq.layers.QuantDense(16, kernel_quantizer=no_op_quantizer)
case_optimizer = lq.optimizers.CaseOptimizer(
    (
        lq.optimizers.Bop.is_binary_variable,  # predicate
        lq.optimizers.Bop(threshold=1e-6, gamma=1e-3),  # optimizer
    ),
    default_optimizer=tf.keras.optimizers.Adam(0.01),
)

[source]

CaseOptimizer

larq.optimizers.CaseOptimizer(
    *predicate_optimizer_pairs, default_optimizer=None, name="optimizer_case"
)

An optmizer wrapper that applies different optimizers to a subset of variables.

An optimizer is used to train a variable iff its accompanying predicate evaluates to True.

For each variable, at most one optimizer's predicate may evaluate to True. If no optimizer's predicate evaluates to True for a variable, it is trained with the default_optimizer. If a variable is claimed by no optimizers and default_optimizer == None, the variable is not trained.

Arguments

  • predicate_optimizer_pairs Tuple[Callable[[tf.Variable], bool], keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2]: One or more (pred, tf.keras.optimizers.legacy.Optimizer) pairs, where pred takes one tf.Variable as argument and returns True if the optimizer should be used for that variable, e.g. pred(var) == True.
  • default_optimizer keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2 | None: A tf.keras.optimizers.legacy.Optimizer to be applied to any variable not claimed by any other optimizer. (Must be passed as keyword argument.)

[source]

Bop

larq.optimizers.Bop(threshold=1e-08, gamma=0.0001, name="Bop", **kwargs)

Binary optimizer (Bop).

Bop is a latent-free optimizer for Binarized Neural Networks (BNNs) and Binary Weight Networks (BWN).

Bop maintains an exponential moving average of the gradients controlled by gamma. If this average exceeds the threshold, a weight is flipped.

The hyperparameter gamma is somewhat analogues to the learning rate in SGD methods: a high gamma results in rapid convergence but also makes training more noisy.

Note that the default threshold is not optimal for all situations. Setting the threshold too high results in little learning, while setting it too low results in overly noisy behaviour.

Warning

The is_binary_variable check of this optimizer will only target variables that have been explicitly marked as being binary using NoOp(precision=1).

Example

no_op_quantizer = lq.quantizers.NoOp(precision=1)
layer = lq.layers.QuantDense(16, kernel_quantizer=no_op_quantizer)

optimizer = lq.optimizers.CaseOptimizer(
    (lq.optimizers.Bop.is_binary_variable, lq.optimizers.Bop()),
    default_optimizer=tf.keras.optimizers.Adam(0.01),  # for FP weights
)

Arguments

  • threshold float: magnitude of average gradient signal required to flip a weight.
  • gamma float: the adaptivity rate.
  • name str: name of the optimizer.

References