Miniaturization has revolutionized computing ever since the advent of the first digital computer. The machines that once filled entire rooms and required teams of engineers to operate are now vastly dwarfed in capability by the smartphones that we carry in our pockets. Now that trend towards miniaturization is transforming the field of machine learning as well. Image and voice recognition, object detection, predictive maintenance, and many other applications that once required complex algorithms that run on huge computing resources in the cloud can now run on a tiny microcontroller in a low-power IoT device.
In part, these feats have been achieved with the help of advances in computing power that were previously mentioned, but that alone is not enough. Microcontrollers are still severely resource-constrained and no match for a large, state of the art neural network. Algorithmic optimizations and data processing improvements are also needed to produce models that are accurate and capable of running inferences on tiny edge computing platforms. As you might have guessed, this level of compression is by no means easy to achieve. The most scarce resource on an edge computing device is memory, so using it wisely is of paramount importance. The recently published work from a small group of engineers at General Motors may make this a little bit simpler with a technique they call TMM-TinyML.
Calculation of memory utilization for each operation (📷: B. Sudharsan et al.)
Calculating the amount of on-device execution memory that will be consumed by a model is typically an error-prone process. And where memory is very limited, these errors have big consequences like run-time memory overflows. Erring on the side of caution and pruning the model too far to ensure it fits into memory can negatively impact the accuracy of the model. What is needed is an accurate estimate of on-device execution memory utilization to save time iterating on model revisions and to maximize use of the resources that are available. TMM-TinyML promises to give an accurate estimate where other methods have fallen flat.
This new method is compatible with a wide range of model types, including MicroNet, Wav2letter, MobileNet, ResNet, NASNet, and Tiny-YOLO. By treating the structure of neural networks as what they call a tensor-oriented computation graph, they can traverse the network with mathematical functions that determine the amount of memory used during each stage of execution. This memory utilization is computed by looking at the size of memory-resident tensors and operators. These calculations can become quite complex when analyzing new models like Inception, NasNet, or MobileNet that are executed non-linearly and have branches. TMM-TinyML has, however, been designed to produce accurate memory usage estimates in these cases as well.
As a validation, nine popular model types were tested, covering use cases ranging from image classification and pose estimation to text detection. Highly accurate memory utilization calculations were observed in these tests, proving the utility of TMM-TinyML for those looking to optimize machine learning models for resource-constrained hardware platforms.