# Larq Compute Engine Operators¶

Larq Compute Engine extends the builtin TensorFlow Lite operators with optimized operators for running binarized neural networks.

## Data Format¶

Larq Compute Engine adheres to the NHWC data format used in TensorFlow Lite for activation tensors. In the following a tensor of format BitTensor<int32, n> represents a TensorFlow Lite tensor with data type int32 containing binary values which are bitpacked across the channel dimension n and potentially padded with zeros. Mathematically a 0 valued bit represents a real value of 1.0 while 1 is interpreted as -1.0.

Options for the operators are stored using FlexBuffers in the .tflite model files generated by the Converter and passed as opaque values to the custom operators.

## List of Operations¶

### LceBconv2d¶

2D binarized convolution layer.

The operation will compute $\hat{y}_{n,\mathrm{out}} = \sum_{i = 0}^{I - 1} w_{\mathrm{out}, i} \star \mathrm{bsign}(x_{n,i})$ with $$n \in [0, N)$$ and $$\mathrm{out} \in [0, O)$$, where $$\star$$ is the 2D cross-correlation operator, $$N$$ is a batch size, $$I$$ and $$O$$ denote the number of input and output channels, and $$\mathrm{bsign}$$[^1] is the binary sign function.

The final output with type Tensor<float32|int8> is then calculated as $y_{n,\mathrm{out}} = \beta_\mathrm{out} + \gamma_\mathrm{out} \, \sigma\left(\hat{y}_{n,\mathrm{out}}\right)\text{.}$ If the output type is BitTensor<int32, 3> the final transformation is simplified to $y_{n,\mathrm{out}} = \begin{cases} -1.0 & \hat{y}_{n,\mathrm{out}} > \tau_\mathrm{out} \\ \hphantom{-}1.0 & \hat{y}_{n,\mathrm{out}} \leq \tau_\mathrm{out}\text{.} \end{cases}$

Inputs

• BitTensor<int32, 3>: 4D input tensor $$x$$
• BitTensor<int32, 3>: 4D bitpacked binary filter tensor $$w$$ in OHWI format
• Tensor<float32> | null: 1D post activation multiplier $$\gamma$$. This operand will be null if an output threshold is set.
• Tensor<float32> | null: 1D post activation bias $$\beta$$. This operand will be null if an output threshold is set.
• Tensor<int32> | null: 1D output threshold $$\tau$$. This operand defines the binary output threshold if experimental_enable_bitpacked_activations is enabled and the output is BitTensor<int32, 3>.

Outputs

• Tensor<float32|int8> | BitTensor<int32, 3>: Result of the 2D convolution of the input tensor $$y$$

Options

• channels_in int32: Number of input channels of the incoming activations. This is necessary since input channels cannot be inferred from the shape of weights and activations if both are bitpacked.
• dilation_height_factor int32: Vertical dilation rate of the filter window
• dilation_width_factor int32: Horizontal dilation rate of the filter window
• fused_activation_function ActivationFunctionType: $$\sigma$$, one of NONE, RELU, RELU_N1_TO_1 or RELU6
• padding Padding: One of SAME or VALID
• stride_height int32: Vertical stride of the filter window
• stride_width int32: Horizontal stride of the filter window

### LceBMaxPool2d¶

Max pooling operation on spatial input data.

Inputs

• BitTensor<int32, 3>: 4D input tensor

Outputs

• BitTensor<int32, 3>: A tensor where each entry is the maximum of the input values in the corresponding window.

Options

• padding Padding: One of SAME or VALID
• stride_width int32: Horizontal stride of the sliding window
• stride_height int32: Vertical stride of the sliding window
• filter_width int32: Horizontal size of the sliding window
• filter_height int32: Vertical size of the sliding window

### LceQuantize¶

Binary quantize operation.

Inputs

• Tensor<float32|int8>: Input tensor

Outputs

• BitTensor<int32, -1>: Binarized tensor, bitpacked along the last dimension

### LceDequantize¶

Binary dequantize operation.

Inputs

• BitTensor<int32, -1>: Binarized input tensor, bitpacked along the last dimension

Outputs

• Tensor<float32|int8>: Output tensor