When to Use OpenCV and Tensorflow Locally Versus in the Cloud


The IoT device system architecture is changing. In the past, most IoT systems involved capturing data such as temperature, vibration, light, image, or sound and pushing that data to the cloud for further processing.

As MCU power grows, more processing can be done on IoT devices. Although the per-device cost of an ARM-based device platform is more expensive than something like an ESP32 platform, the additional processing power can be applied to advanced data processing directly on the device.

To illustrate this trend, I’ll show examples from a contest I was recently involved in where developers built projects using the RICOH THETA, a consumer camera based on the Qualcomm Snapdragon 625 platform.

The Benefit of Running Android on Local Devices

Although most people think of mobile phones when they hear, “Android,” this Linux-based OS is used on a wide range of embedded devices that do not have touchscreen or phone capability. By running Android on an embedded device, you open up your platform development to millions of Android developers.

The examples in this article do not focus on IoT. They show the direction that ARM-based IoT devices can take in the future and provide creative ideas on how to push the limits of data processing on embedded devices.

A quick look at the contest winners shows innovation in image processing, AR/VR, voice recognition, facial expression recognition, telepresence, and AI.

Development Contest Project Winners

An analysis of the technology used by the developers shows a strong trend toward artificial intelligence (AI), machine learning (ML), and computer vision (CV).

technology used in projects submitted

In the chart below, you can see that most of the developers focused on local processing using OpenCV or TensorFlow. Note that FastCV is a variant of OpenCV that utilizes the GPU hardware acceleration of the Qualcomm Snapdragon platform.

Details on Technology Used

Learning Models

To illustrate the process of using neural networks with embedded devices, let’s look at a project that takes a picture when a person smiles or says, “yes.” The project was built by Amine Amri.

Amine Amri’s project is based on the TensorFlow examples for Android that are included with the base TensorFlow project

It uses two neural networks:

YOLO (You Only Look Once) is a network for object detection. The object detection task consists of determining the location on the image where certain objects are present, as well as classifying those objects. Previous methods for this, like R-CNN (Regional Convolutional Neural Network) and its variations, used a pipeline to perform this task in multiple steps. This can be slow to run and also hard to optimize because each individual component must be trained separately. YOLO does it all with a single neural network.

With YOLO, you take an image as input, pass it through a neural network that looks similar to a normal CNN, and you get a vector of bounding boxes and class predictions in the output.

How Amine Amri Trained YOLO

For training YOLO, he used the darknet implementation. Darknet is an open-source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation

FDDB-360 Dataset for Training 360 Images

Amine used the FDDB-360 Dataset, which is derived from Face Detection Dataset and Benchmark (FDDB, http://viswww.cs.umass.edu/fddb/). It contains fisheye-looking images created from FDDB images and is intended to help train models for face detection in 360° fisheye images. FDDB-360 contains 17,052 fisheye-looking images and a total of 26,640 annotated faces. The dataset is available from http://www.sfu.ca/~ibajic/#data (J. Fu, S. R. Alvar, I. V. Bajić, and R. G. Vaughan, “FDDB-360: Face detection in 360-degree fisheye images,” Proc. IEEE MIPR’19, San Jose, CA, Mar. 2019).

The training was performed on a GPU and lasted about 50 hours.

Another common use of IoT devices is to collect light information. To illustrate how this might be accomplished in the future, let’s look at Authydra by Kasper Oerlemans. This application takes 34 pictures and combines them into a single EXR file that contains accurate light information for use in visual special effects projects.

The light information is used to build 3D signs such as this test by Alexandre Dizeux using Authydra.

Authydra Test

To build his light tool, Kasper used OpenCV. Information on the GitHub repo for this project is available on Kasper’s post in a developer community that he’s active in.

The first image is used to configure the camera settings. Each additional set of three images are combined for denoise. The next processing step takes the 11 datasets built from the 33 images and combines the 11 “image” datasets into a single EXR file.

When to Process Data Locally

In most cases, basic data such as temperature, humidity, vibration, pH, and dewpoint can be processed in the cloud. However, if you’re training your device with image data or making decisions on how to capture more data locally, you may benefit from local processing.

The examples used in this article are based on a fairly expensive ARM platform. It’s likely that the cost would only be justified in specialized workflows right now. However, the processing power of ARM platforms will continue to increase and the cost will decrease.

When you prototype your next project, think about what existing libraries like OpenCV and TensorFlow already exist that can run on your device locally. It may create a much better experience for your customer or increase the quality of the data you collect.

This UrIoTNews article is syndicated fromDzone