Larq Compute Engine Python API¶
The LCE Python API allows you to convert a Keras model built with larq
to an LCE-compatible TensorFlow Lite FlatBuffer format for inference. The package also includes a reference interpreter useful for accuracy evaluation and debugging.
Installation¶
Before installing the LCE Python API and the converter, please install:
- Python version
3.6
,3.7
,3.8
, or3.9
- Tensorflow version
1.14
,1.15
,2.0
,2.1
,2.2
,2.3
,2.4
or2.5
(recommended):pip install tensorflow # or tensorflow-gpu
You can install larq-compute-engine
with Python's pip package manager:
pip install larq-compute-engine
API documentation¶
convert_keras_model¶
larq_compute_engine.convert_keras_model(
model,
*,
inference_input_type=tf.float32,
inference_output_type=tf.float32,
target="arm",
experimental_default_int8_range=None
)
Converts a Keras model to TFLite flatbuffer.
Example
tflite_model = convert_keras_model(model)
with open("/tmp/my_model.tflite", "wb") as f:
f.write(tflite_model)
Arguments
- model
keras.engine.training.Model
: The model to convert. - inference_input_type
tf.dtypes.DType
: Data type of the input layer. Defaults totf.float32
, must be eithertf.float32
ortf.int8
. - inference_output_type
tf.dtypes.DType
: Data type of the output layer. Defaults totf.float32
, must be eithertf.float32
ortf.int8
. - target
str
: Target hardware platform. Defaults to "arm", must be either "arm" or "xcore". - experimental_default_int8_range
Tuple[float, float] | None
: Tuple of integers representing(min, max)
range values for all arrays without a specified range. Intended for experimenting with quantization via "dummy quantization". (default None)
Returns
The converted data in serialized format.
convert_saved_model¶
larq_compute_engine.convert_saved_model(
saved_model_dir,
*,
inference_input_type=tf.float32,
inference_output_type=tf.float32,
target="arm",
experimental_default_int8_range=None
)
Converts a SavedModel to TFLite flatbuffer.
Example
tflite_model = convert_saved_model(saved_model_dir)
with open("/tmp/my_model.tflite", "wb") as f:
f.write(tflite_model)
Arguments
- saved_model_dir
str | os.PathLike
: SavedModel directory to convert. - inference_input_type
tf.dtypes.DType
: Data type of the input layer. Defaults totf.float32
, must be eithertf.float32
ortf.int8
. - inference_output_type
tf.dtypes.DType
: Data type of the output layer. Defaults totf.float32
, must be eithertf.float32
ortf.int8
. - target
str
: Target hardware platform. Defaults to "arm", must be either "arm" or "xcore". - experimental_default_int8_range
Tuple[float, float] | None
: Tuple of integers representing(min, max)
range values for all arrays without a specified range. Intended for experimenting with quantization via "dummy quantization". (default None)
Returns
The converted data in serialized format.
Interpreter¶
larq_compute_engine.testing.Interpreter(
flatbuffer_model,
num_threads=1,
use_reference_bconv=False,
use_indirect_bgemm=False,
use_xnnpack=False,
)
Interpreter interface for Larq Compute Engine Models.
Warning
The Larq Compute Engine is optimized for ARM v8, which is used by devices such as a Raspberry Pi or Android phones. Running this interpreter on any other architecture (e.g. x86) will fall back on the reference kernels, meaning it will return correct outputs, but is not optimized for speed in any way!
Example
lce_model = convert_keras_model(model)
interpreter = Interpreter(lce_model)
interpreter.predict(input_data, verbose=1)
Arguments
- flatbuffer_model
bytes
: A serialized Larq Compute Engine model in the flatbuffer format. - num_threads
int
: The number of threads used by the interpreter. - use_reference_bconv
bool
: When True, uses the reference implementation of LceBconv2d. - use_indirect_bgemm
bool
: When True, uses the optimized indirect BGEMM kernel of LceBconv2d. - use_xnnpack
bool
: When True, uses the XNNPack delegate of TFLite.
Attributes
- input_types: Returns a list of input types.
- input_shapes: Returns a list of input shapes.
- input_scales: Returns a list of input scales.
- input_zero_points: Returns a list of input zero points.
- output_types: Returns a list of output types.
- output_shapes: Returns a list of output shapes.
- output_scales: Returns a list of input scales.
- output_zero_points: Returns a list of input zero points.
predict¶
Interpreter.predict(x, verbose=0)
Generates output predictions for the input samples.
Arguments
- x
np.ndarray | List[np.ndarray] | Iterator[np.ndarray | List[np.ndarray]]
: Input samples. A Numpy array, or a list of arrays in case the model has multiple inputs. - verbose
int
: Verbosity mode, 0 or 1.
Returns
Numpy array(s) of output predictions.