Skip to content

Larq Compute Engine Python API

The LCE Python API allows you to convert a Keras model built with larq to an LCE-compatible TensorFlow Lite FlatBuffer format for inference. The package also includes a reference interpreter useful for accuracy evaluation and debugging.

Installation

Before installing the LCE Python API and the converter, please install:

  • Python version 3.6, 3.7, 3.8, or 3.9
  • Tensorflow version 1.14, 1.15, 2.0, 2.1, 2.2, 2.3, 2.4 or 2.5 (recommended):
    pip install tensorflow  # or tensorflow-gpu
    

You can install larq-compute-engine with Python's pip package manager:

pip install larq-compute-engine

API documentation

[source]

convert_keras_model

larq_compute_engine.convert_keras_model(
    model,
    *,
    inference_input_type=tf.float32,
    inference_output_type=tf.float32,
    target="arm",
    experimental_default_int8_range=None,
    experimental_enable_bitpacked_activations=False
)

Converts a Keras model to TFLite flatbuffer.

Example

tflite_model = convert_keras_model(model)
with open("/tmp/my_model.tflite", "wb") as f:
    f.write(tflite_model)

Arguments

  • model tf.keras.Model: The model to convert.
  • inference_input_type tf.dtypes.DType: Data type of the input layer. Defaults to tf.float32, must be either tf.float32 or tf.int8.
  • inference_output_type tf.dtypes.DType: Data type of the output layer. Defaults to tf.float32, must be either tf.float32 or tf.int8.
  • target str: Target hardware platform. Defaults to "arm", must be either "arm" or "xcore".
  • experimental_default_int8_range Optional[Tuple[float, float]]: Tuple of integers representing (min, max) range values for all arrays without a specified range. Intended for experimenting with quantization via "dummy quantization". (default None)
  • experimental_enable_bitpacked_activations bool: Enable an experimental converter optimisation that attempts to reduce intermediate activation memory usage by bitpacking the activation tensor between consecutive binary convolutions where possible.

Returns

The converted data in serialized format.


[source]

convert_saved_model

larq_compute_engine.convert_saved_model(
    saved_model_dir,
    *,
    inference_input_type=tf.float32,
    inference_output_type=tf.float32,
    target="arm",
    experimental_default_int8_range=None,
    experimental_enable_bitpacked_activations=False
)

Converts a SavedModel to TFLite flatbuffer.

Example

tflite_model = convert_saved_model(saved_model_dir)
with open("/tmp/my_model.tflite", "wb") as f:
    f.write(tflite_model)

Arguments

  • saved_model_dir Union[str, os.PathLike]: SavedModel directory to convert.
  • inference_input_type tf.dtypes.DType: Data type of the input layer. Defaults to tf.float32, must be either tf.float32 or tf.int8.
  • inference_output_type tf.dtypes.DType: Data type of the output layer. Defaults to tf.float32, must be either tf.float32 or tf.int8.
  • target str: Target hardware platform. Defaults to "arm", must be either "arm" or "xcore".
  • experimental_default_int8_range Optional[Tuple[float, float]]: Tuple of integers representing (min, max) range values for all arrays without a specified range. Intended for experimenting with quantization via "dummy quantization". (default None)
  • experimental_enable_bitpacked_activations bool: Enable an experimental converter optimisation that attempts to reduce intermediate activation memory usage by bitpacking the activation tensor between consecutive binary convolutions where possible.

Returns

The converted data in serialized format.


[source]

Interpreter

larq_compute_engine.testing.Interpreter(
    flatbuffer_model, num_threads=1, use_reference_bconv=False
)

Interpreter interface for Larq Compute Engine Models.

Warning

The Larq Compute Engine is optimized for ARM v8, which is used by devices such as a Raspberry Pi or Android phones. Running this interpreter on any other architecture (e.g. x86) will fall back on the reference kernels, meaning it will return correct outputs, but is not optimized for speed in any way!

Example

lce_model = convert_keras_model(model)
interpreter = Interpreter(lce_model)
interpreter.predict(input_data, verbose=1)

Arguments

  • flatbuffer_model bytes: A serialized Larq Compute Engine model in the flatbuffer format.
  • num_threads int: The number of threads used by the interpreter.
  • use_reference_bconv bool: When True, uses the reference implementation of LceBconv2d.

Attributes

  • input_types: Returns a list of input types.
  • input_shapes: Returns a list of input shapes.
  • output_types: Returns a list of output types.
  • output_shapes: Returns a list of output shapes.

[source]

predict

Interpreter.predict(x, verbose=0)

Generates output predictions for the input samples.

Arguments

  • x Union[np.ndarray, List[np.ndarray], Iterator[Union[np.ndarray, List[np.ndarray]]]]: Input samples. A Numpy array, or a list of arrays in case the model has multiple inputs.
  • verbose int: Verbosity mode, 0 or 1.

Returns

Numpy array(s) of output predictions.