Contributed Commentary by Hamza Ghadyali, AI Specialist at SAS
Human beings are not suited to perform repetitive and tedious tasks, no matter how important, skillful, or critical they might be. Imagine watching hours of surveillance footage with extremely rare instances of anomalous activity. Or being a doctor who must manually review volumes of medical imaging to identify tumorous cells. Similarly, in manufacturing, inspecting products for defects is exceedingly mundane when only one in a million needs to be pulled. As human beings, we are curious at our core, and our work should play to that strength.
Fortunately, today’s analytics and AI capabilities are perfect for such tasks.
AI has had a resurgence in the last five years because new machine learning techniques are now solving three particular problems in ways that are similar to humans and with comparable accuracy: playing games (reinforcement learning), reading and understanding natural language (text analytics), and analyzing images and videos (computer vision with convolutional neural networks or CNNs).
A leader from an organization’s advanced manufacturing division reached out to my team because he had heard about computer vision and had a hunch that some of the problems he encounters in his plant could be solved easily with modern techniques that use deep learning. We found out that we could solve the problem—not easily, perhaps, but with sustained creative effort.
Hamza Ghadyali, AI Specialist, SAS
The organization’s plant is an intricate system of conveyor belts that weaves and snakes throughout a large manufacturing floor, transforming raw materials through dozens of stations into a final product ready for shipping. As the forming product is handed off from one station to the next, it can get stuck. Like a car slamming its brakes on a busy highway, this can lead to a pile-up. Furthermore, in the same way that traffic behind the accident will get slowed down or come to a grinding halt, the production line upstream may have to be shut down if the jam cannot be cleared quickly. While such events are relatively rare, they are costly. Products involved in the collision will likely have to be scrapped, and manual intervention is required to remove the damaged products and clear the jam. Shutting down the production line causes a huge loss in daily yield.
My team was tasked with identifying and developing the right analytical solutions. The company sent us a few minutes’ worth of sample footage demonstrating normal operation versus a collision event. Given the problem and some sample data, we could begin the modeling process. I’ll describe three approaches we took in increasing order of complexity. The more complex models achieve higher robustness at the expense of taking more time to build and having slightly slower real-time performance.
Our sample videos were centered on a station in which a product comes in, has a sticker applied and then proceeds. When running smoothly, it is a regular, repetitive process. The first and simplest solution was a model that treats every image frame as a vector and computes the distance to a reference frame. Applying this model to the video footage results in a periodic signal much like a sinusoidal wave; a collision event breaks the wave and turns it into a constant flat line. This pixel level inter-frame distance method is ultra-light, simple and fast, but not robust. It has limited value in that it will only work in the specific circumstances that resemble something like the sample footage. For example, the periodicity could also break due to inactivity instead of a collision event. To build a more robust model, we turned to deep learning.
Looking closer at the sample footage, we observe that collisions occur in a specific area and are characterized by two products coming together and staying together due to the collision. This simple but powerful observation allows us to reduce the problem to a simple binary image classification task. However, our two classes are not collision versus normal, but together versus apart. Using a CNN to perform this classification task proved to be straightforward. The CNN’s output is periodic with rhythmic alternating of together and apart, and a simple rule can be defined to alert a collision event if the system is stuck in the “together” state for too long. This creative model needs some additional data prep to label the two classes, but is still quick to build, performs smoothly in real time, and is robust to far more environmental conditions than the ultra-simplistic first model. The drawback of this model is that it can only detect collisions in a specific area, and there is no additional explainability since the binary classification is a black box that doesn’t produce any extra useful outputs. To build an even more robust and generalizable model, we used even more advanced techniques in deep learning and made some innovations of our own along the way.
To get into all the details of the last model would be a paper of its own. Simplified, we built a robust multiple object tracking pipeline that involved applying two CNNs in succession combined with a time-series model to incorporate information across consecutive image frames. What’s important is the output of this model gives detailed information about the position and velocity of every product on the production line, along with information about their positions relative to each other, and it does this on every image frame in real time. Producing such detailed metrics not only allows us to detect collision events for the purpose of sending out alerts and taking reactive actions, but it also to collect the data needed for building models to predict in advance the occurrence of a collision event and proactively take actions to prevent catastrophe. While we could be satisfied with this result, there are clearly some drawbacks to this approach. Deploying such large models requires heavy computational power to operate in real time, and the labeling process for training these supervised learning models was far more time-consuming.
We hope to be able to address some of these issues with new methods coming out in model compression and active learning, but that’s a discussion for the future. It’s an exciting time to be working in AI, but it will only have the positive impacts we desire if it is practical. AI models are practical only when we know three things: We can build it, we can deploy it, and most importantly, we are solving the right problems that deliver value and effect positive change augmenting human effort.
Hamza Ghadyali, AI Specialist at SAS, works alongside the world’s largest companies in manufacturing, utilities, and retail, to deliver business value by designing and deploying cutting-edge AI models. Most recently, his focus has been on real-time AI for computer vision to transform live video streams into actionable insights that are integrated into the continuous analytics lifecycle. He believes in developing models that not only make businesses more efficient but also improve the lives of workers and customers. Hamza Ghadyali earned his Ph.D in Mathematics from Duke University where he invented novel geometric and topological methods for data mining and machine learning. He can be reached at Hamza.Ghadyali@sas.com.