Up till now, I was relentlessly pursuing the best accuracy I can get with my neural network. The size of my Core ML app is of course increasing.

While going through the presentation slides of What’s New in Core ML, Part 1, Apple talked about the size of weight being a factor in the overall Core ML app size.

This got me thinking to try post-training quantization on my fully trained models. Training with low precision never worked for me in the past, especially while doing backprogagations.

Core ML Tools 2.0 introduced new utilities to reduce a fully trained model to a lower precision model.

coremltools.models.neural_network.quantization_utils

Utilities to compress Neural Network Models. Only available in coremltools 2.0b1 and onwards

On my sandbox app, I managed to reduce the model size from 34Mb to 9.2Mb. This was not the only benefit. The runtime appears to be 2X faster on iPhone XS Max.

In my apps, this is where I feel quantized neural network would make a difference in the inference phase, though accuracy will be dependent on the environment in which the neural network is used.

On a side note, there is also a ‘post_training_quantize’ flag in the TensorFlow Lite conversion tool.

Leave a comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.