Up till now, I was relentlessly pursuing the best accuracy I can get with my neural network. The size of my Core ML app is of course increasing.
While going through the presentation slides of What’s New in Core ML, Part 1, Apple talked about the size of weight being a factor in the overall Core ML app size.
This got me thinking to try post-training quantization on my fully trained models. Training with low precision never worked for me in the past, especially while doing backprogagations.
Core ML Tools 2.0 introduced new utilities to reduce a fully trained model to a lower precision model.
On my sandbox app, I managed to reduce the model size from 34Mb to 9.2Mb. This was not the only benefit. The runtime appears to be 2X faster on iPhone XS Max.
In my apps, this is where I feel quantized neural network would make a difference in the inference phase, though accuracy will be dependent on the environment in which the neural network is used.
On a side note, there is also a ‘post_training_quantize’ flag in the TensorFlow Lite conversion tool.