The edge is beginning to take shape as a way of limiting the amount of data that needs to be pushed up to the cloud for processing, setting the stage for a massive shift in compute architectures and a race among chipmakers for a stake in a new and highly lucrative market.
So far, it’s not clear which architectures will win, or how and where data will be partitioned between what needs to be processed and what can be trashed. Nevertheless, there is widespread agreement that moving massive amounts of data to the cloud for processing in bulk is unworkable from a power, bandwidth and latency perspective. So while the cloud still has an important role to play, another layer of processing is required between the end devices and the giant hyperscale data centers, particularly for images and streaming data.
“The Internet, at present, is built backwards,” said Rich Wolsky, professor of computer science at the University of California, Santa Barbara. “We need to start thinking less about IoT and start thinking more about infrastructure. Right now, it doesn’t scale. We need to design a set of abstractions that include power and cost limitations.”
This is where the edge fits in. Initially, the idea was that 5G communications would alleviate the need for another layer, allowing inexpensive IoT devices to send data directly to the cloud for instantaneous processing. But the limitations of millimeter wave technology, coupled with enormous amounts of data — and the time and energy necessary to move that data — are pointing the compute world in a different direction.
“The data from 1 trillion IoT devices would break the Internet,” said Eric Van Hensbergen, a fellow in Arm’s Research Division. “Edge computing is inevitable, but it also creates a really interesting challenge about how to do end-to-end distributed systems development. There are all sorts of different form factors, various different accelerators, and it raises questions about where is the best place to run a particular function. So do you run a function near or far? If you are aggregating several sensors, none of the infrastructure is built for this.”
And that’s just the beginning of where the edge concept begins to take shape. “You’ve got to deal with whoever owns the data, which may be equipment on a factory floor, or cellular data going to a cell tower,” said Van Hensbergen. “Then you need to architect end-to-end security and figure out how you build in privacy and/or how to supply mechanisms for anonymity.”
What is an edge device?
Ask two people what constitutes the edge and you are likely to get two different answers. In general, an edge device does some amount of processing of information, as opposed to an IoT device, or endpoint, where data is generated by a sensor and shipped to the cloud for processing. While that approach works for some applications, the time it takes to send that data, process it, and send back whatever is relevant can take seconds or minutes.
It also doesn’t address a key issue, which is that not all data is of equal value. Consider an autonomous or assisted-driving vehicle, for example. If camera image sensors detect an object in the road, they need to process that data immediately. But until they detect that object, the road may be completely clear of objects, so the data streaming into a sensor is useless to the people in that vehicle. Still, the fact that there is nothing on the road may be useful for other vehicles approaching that area at some point in the future, so processing at least some of that data in the cloud may be important for routing vehicles.
The big problem is how to partition this data. Autonomous vehicles, robots, drones, or any other systems in motion are generally considered edge devices because this is where the partitioning and a sizeable portion of the processing needs to happen. The definition also includes a home gateway, but it also can include a server that is in closer proximity to the source of data than the cloud, and it can include private clouds and various types of on-premise computers.
“None of the IoT deployments today is portable,” said UC Santa Barbara’s Wolsky. “You can communicate and move data, and that data may have some error in it, or you can compute. If you’re missing data, you can fill it in with an interpolation of some sort, but it’s often not clear which you should do. That’s going to be driven by a power budget or a deadline. It’s also not clear what you should do in terms of accuracy. There’s an overall penalty for interpolation. So you wind up making a decision on what analytics to run based on the quality of your data, which can be very time-dependent. And then the quality is dependent on the deployment — how far apart are things, how lossy is your network, how much power do you have?”
The approach also helps to safeguard that data, both from a privacy and security standpoint, as well as the integrity of the data itself.
Rethinking compute models
For all of these reasons, processing data locally or regionally begins to look increasingly attractive, and it is prompting companies to come up with innovative ways of dealing with this problem.
Fraunhofer, for example, has developed a sensor that strips out the important data from an image sensor rather than streaming the whole image.
“Fifteen years ago, the idea was that everything would go into the cloud and that you would need a lot of bandwidth,” said Andreas Brüning, head of the department for efficient electronics at Fraunhofer’s Engineering of Adaptive Systems Division. “The edge was defined as the router, and data was sent from the sensor to the router and to the cloud. But for efficient electronics, you need analog filtering. So you capture just the image that you want, and you do that with the next picture. You can do this multiple times per second for multiple thousands of pictures, and you calculate only on the pictures.”
Rather than take the full frame of that image, the sensor collects only what is essential to save. If there are two cars on the road, for example, and they constitute a small fraction of the sensor’s field of vision, the only data saved are those two cars in multiple frames. That dramatically reduces the amount of data that needs to be processed beyond the sensor.
“A lot of people are looking at this as a problem of more data,” Brüning said. “We are coming at it from another angle. It’s about knowledge of function.”
One of the key elements in the edge is deciding where to process data, and that can vary greatly. For example, Achronix teamed up with BittWare to create a PCIe accelerator card that does some of that processing on a smart card rather than in a CPU or GPU.
“What’s different is that we weren’t just trying to build a new card,” said Steve Mensor, vice president of marketing at Achronix. “The goal was to add new capabilities. This isn’t just a board with a dumb bucket of bits. What we saw was a great diversity of workloads in places where there was no room for more Xeon processors, because they already had maxed out what a CPU can do.”
While the approach works equally well in the cloud and at the edge, the goal of offloading processing from the CPU is the same. There are multiple accelerators, an internal network-on-chip, and GDDR6 memories.
“Instead of taking data to the algorithm, you’re taking the algorithm to the data,” said Craig Petrie, vice president of marketing at BittWare. “There also are expansion ports so you can tap directly into the port in the card and not move the data very far. So we could be clear about how to architect the chip or the card, but we’re also finding this is a dynamic market and it needs to adapt to new applications. We’re seeing new uses of this card that we didn’t see at first.”
Fig. 1: Diverse workloads at the edge. Source: Achronix
But while benchmarks helps establish a structure, the challenge is that all of this needs to be set in the context of how these chips are actually used. That becomes particularly evident when AI and machine learning are layered on top of many of these edge architectures, because the way to really eke performance and power benefits from these chips is to achieve efficiencies through much tighter integration of hardware and software.
In the past, most of this development has been confined to the cloud because that is where the algorithms are trained. In the case of training, the more data the better. But inferencing can operate on a much smaller data set, which makes it suitable for both the cloud and the edge.
That has spawned a whole new wave of competition in a market that until six months ago barely existed. It includes everything from automotive to industrial and medical applications, and chip architectures are a key piece of the puzzle. But one of the challenges, because this market is so new, is how to compare the performance of these chips in a standardized way.
“A single benchmark isn’t a good indication of an inference chip’s performance,” said Geoff Tate, CEO of Flex Logix. “If you look at ResNet 50 and YOLOv3, there’s a big difference not only in the computation side, but also in how they use memory.”
For ResNet 50, each image takes 2 billion multiply accumulates, but for YOLOv3, it’s about 200 billion multiply accumulates (MACs) because the computational workload is much higher. So to really do an effective comparison, it requires understanding workloads, weights and how the memory subsystems are architected. This is particularly important in automotive, where response time is tied to safety.
“With edge applications, you almost certainly have to respond instantly,” said Tate. “In a car you certainly want to know where people are so you have a chance to avoid them. Finding out later what you hit doesn’t help you very much.”
Understanding how one chip compares to another in this area has set off a scramble to standardize measurements in a way similar to what has been done for decades in the CPU world. MLPerf, started last year by companies such as Alibaba, Nvidia, Dell EMC, Google, Habana Labs, Qualcomm, in conjunction with Stanford University, UC Berkeley and Harvard University, among others, this week rolled out 500 inferencing benchmarks from 14 different companies, along with a white paper on the methodology used.
“The goal here is to be able to make apples-to-apples comparisons,” Itay Hubara, a researcher at Habana. “This will be followed by power benchmarks in the next release.”
Fig. 2: Four scenarios for inferencing. Source: MLPerf
All of this assumes that the sensor input is correct, that sensors are behaving consistently and working under ideal conditions, and that sensors will be recalibrated over time to account for drift. In the real world, none of that is guaranteed.
“The biggest change is in the software,” said Jeff Phillips, head of automotive marketing at National Instruments. “You’re validating the input itself — so not only does the sensor see it, but it also classifies it, which then implies how it predicts the next movement. With most of the car crashes involving autonomous vehicles, the sensor saw an object but didn’t choose the right behavior. Does it see the object, classify it correctly, and accurately predict its next movement? And then, how does that apply across all of the things that the sensor sees, in addition to the LiDAR, other front radar, another ultrasonic sensor? How is all of that tied together? What changes is the number of variables in the scenario.”
There are other changes associated with the edge, as well, including how and where data is stored. The movement toward in-memory and near-memory computing is a recognition of how much is changing on the edge. Adesto Technologies, for example, is researching how to store analog data directly in memory without converting it to a digital format. In addition, there are efforts to store strings of data rather than individual bits. And as algorithms begin to mature, there will be a huge effort to optimize the software to take advantage of new hardware.
While the concept of the edge isn’t entirely new, until recently it was considered a way of getting data to and from the cloud more quickly. That has changed significantly over the past year, with momentum building for completely new compute architectures.
The big question for this part of the market is whether it will consolidate around standardized chip and compute architectures or splinter into more customized solutions for individual applications and market segments. At this point in the edge’s development, it’s difficult to say how this will evolve, or what the ultimate impact will be on both the cloud and future end point designs.
Data Confusion At The Edge
Disparities in processors and data types will have an unpredictable impact on AI systems.
HW/SW Design At The Intelligent Edge
Systems are extremely specific and power-constrained, which makes design extremely complex.
Where Is The Edge?
what the edge will look like, how that fits in with the cloud, what the requirements are both for processing and for storage, and how this concept will evolve.
Machine Learning Inferencing At The Edge
How designing ML chips differs from other types of processors.
Optimizing Power For Learning At The Edge
Making learning on the edge work requires a significant reduction in power, which means rethinking all parts of the flow.
Edge Knowledge Center
Memory Subsystems In Edge Inferencing Chips
Tradeoffs and their impact on power, heat, performance and area.