`larq.optimizers`

¶

Neural networks with extremely low-precision weights and activations, such as Binarized Neural Networks (BNNs), usually contain a mix of low-precision weights (e.g. 1-bit) and higher-precision weights (e.g. 8-bit, 16-bit, or 32-bit). Examples of this include the first and last layers of image classificiation models, which have higher-precision weights in most BNN architectures from the literature.

Training a BNN, then, consists of optimizing both low-precision and higher-precision weights. In `larq`

, we provide a mechanism to target different bit-precision variables with different optimizers using the `CaseOptimizer`

class. Modeled after the `tf.case`

signature, `CaseOptimizer`

accepts pairs of predicates and optimizers. A predicate, given a variable, decides whether its optimizer should train that variable.

A `CaseOptimizer`

behaves much like any other Keras optimizer, and once you instantiate it you can pass it to your `model.compile()`

as usual. To instantiate a `CaseOptimzer`

, pass one or a list of `(predicate, optimizer)`

tuples, along with a `default`

optimizer which trains any variables not claimed by another optimizer. A variable may not be claimed by more than one optimizer's predicate.

Example

```
no_op_quantizer = lq.quantizers.NoOpQuantizer(precision=1)
layer = lq.layers.QuantDense(16, kernel_quantizer=no_op_quantizer)
case_optimizer = lq.optimizers.CaseOptimizer(
(
lq.optimizers.Bop.is_binary_variable, # predicate
lq.optimizers.Bop(threshold=1e-6, gamma=1e-3), # optimizer
),
default_optimizer=tf.keras.optimizers.Adam(0.01),
)
```

### CaseOptimizer¶

```
larq.optimizers.CaseOptimizer(
*predicate_optimizer_pairs, default_optimizer=None, name="optimizer_case"
)
```

An optmizer wrapper that applies different optimizers to a subset of variables.

An optimizer is used to train a variable iff its accompanying predicate evaluates to `True`

.

For each variable, at most one optimizer's predicate may evaluate to `True`

. If no optimizer's predicate evaluates to `True`

for a variable, it is trained with the `default_optimizer`

. If a variable is claimed by no optimizers and `default_optimizer == None`

, the variable is not trained.

**Arguments**

**predicate_optimizer_pairs**`Tuple[Callable[[tf.Variable], bool], keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2]`

: One or more`(pred, tf.keras.optimizers.Optimizer)`

pairs, where`pred`

takes one`tf.Variable`

as argument and returns`True`

if the optimizer should be used for that variable, e.g.`pred(var) == True`

.**default_optimizer**`Optional[keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2]`

: A`tf.keras.optimizers.Optimizer`

to be applied to any variable not claimed by any other optimizer. (Must be passed as keyword argument.)

### Bop¶

```
larq.optimizers.Bop(threshold=1e-08, gamma=0.0001, name="Bop", **kwargs)
```

Binary optimizer (Bop).

Bop is a latent-free optimizer for Binarized Neural Networks (BNNs) and Binary Weight Networks (BWN).

Bop maintains an exponential moving average of the gradients controlled by `gamma`

. If this average exceeds the `threshold`

, a weight is flipped.

The hyperparameter `gamma`

is somewhat analogues to the learning rate in SGD methods: a high `gamma`

results in rapid convergence but also makes training more noisy.

Note that the default `threshold`

is not optimal for all situations. Setting the threshold too high results in little learning, while setting it too low results in overly noisy behaviour.

Warning

The `is_binary_variable`

check of this optimizer will only target variables that have been explicitly marked as being binary using `NoOp(precision=1)`

.

Example

```
no_op_quantizer = lq.quantizers.NoOp(precision=1)
layer = lq.layers.QuantDense(16, kernel_quantizer=no_op_quantizer)
optimizer = lq.optimizers.CaseOptimizer(
(lq.optimizers.Bop.is_binary_variable, lq.optimizers.Bop()),
default_optimizer=tf.keras.optimizers.Adam(0.01), # for FP weights
)
```

**Arguments**

**threshold**`float`

: magnitude of average gradient signal required to flip a weight.**gamma**`float`

: the adaptivity rate.**name**`str`

: name of the optimizer.

**References**