We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
The aggravation, the unexpected delays, the lost time, the high costs: commuting ranks regularly as the worst part of the day by people worldwide and is one of the big drivers for work-from-home policies.
Computers feel the same way. Computational storage is part of an emerging trend to make datacenters, edge servers, IoT devices, cars and other digitally-enhanced things more productive and more efficient by moving data less. In computational storage, a full-fledged computing system — complete with DRAM, I/O, application processors, dedicated storage and system software — gets squeezed into the confines of an SSD to manage repetitive, preliminary, and/or data-intensive tasks locally.
Why? Because moving data can soak up inordinate amounts of money, time, energy, and compute resources. “For some applications like compression in the drive, hardware engines consuming less than a watt can achieve the same throughput as over 140 traditional server cores,” said JB Baker, VP of marketing and product management at ScaleFlux. “That’s 1,500 watts and we can do the same work with a watt.”
Unnecessary data circulation is also not good for the environment. A Google-sponsored study from 2018 found that 62.7% of computing energy is consumed by shuttling data between memory, storage and the CPU across a wide range of applications. Computational storage, thus, could cut emissions while improving performance.
And then there’s the looming capacity problem. Cloud workloads and internet traffic grew by 10x and 16x in the past decade and will likely grow at that rate or faster in the coming years as AI-enhanced medical imaging, autonomous robots and other data-heavy applications move from concept to commercial deployment.
Unfortunately, servers, rack space and operating budgets struggle to grow at that same exponential rate. For example, Amsterdam and other cities have applied strict limits on data center size forcing cloud providers and their customers to figure out how to do more within the same footprint.
Consider a traditional two-socket server set-up with 16 drives. An ordinary server might contain 64 computing cores (two processors with 32 cores each). With computational storage, the same server could potentially have 136: 64 server cores and 72 application accelerators tucked into its drives for preliminary tasks. Multiplied over the number of servers per a rack, racks per datacenter, and datacenters per cloud empire, computational drives have the power to boost the potential ROI of millions of square feet of real estate.
The fine print
So if computational storage is so advantageous, how come it’s not pervasive already? The reason is simple — a confluence of advancements, from hardware to software to standards must come together to make a paradigm shift in processing commercially viable. These factors are all aligning now.
For example, computational storage drives have to fit within the same power and space constraints of regular SSDs and servers. That means the computational element can only consume two to three watts of the 8 watts allotted to a drive in a server.
While some early computational SSDs relied on FPGAs, companies such as NGD Systems and ScaleFlux are adopting system-on-chips (SoCs) built around Arm processors originally developed for smartphones. (An eight-core computational drive SoC might dedicate four cores to managing the drive and the remainder to applications.) SSDs typically already have quite a bit of DRAM — 1GB for every terabyte in a drive. In some cases, the computational unit can use this as a resource. Manufacturers can also add more DRAM.
Additionally, a computational storage drive can support standard cloud-native software stack: Linux OSes, containers built with Kubernetes, or Docker. Databases and machine learning algorithms for image recognition and other applications may also be loaded into the drive.
Standards will also need to be finalized. The Storage Networking Industry Association (SNIA) last year released its 0.8 specification covering a broad range of issues such as security and configuration; a full specification anticipated later this year.
Other innovations you should expect to see: more ML acceleration and specialized SoCs, faster interconnects, enhanced on-chip security, better software for analyzing data in real-time, and tools for merging data from distributed networks of drives.
Over time, we could also see the emergence of computational capabilities added to traditional rotating hard drives, still the workhorse of storage in the cloud.
A double-edged edge
Some early use cases will occur at the edge with the computational drive acting in an edge-for-the edge manner. Microsoft Research and NGD Systems, for instance, found that computational storage drives could dramatically increase the number of image queries that can be performed by directly processing the data on the CSDs — one of the most discussed use cases — and that throughput grows linearly with more drives.
Bandwidth-constrained devices often with low latency requirements such as airplanes or autonomous vehicles are another prime target. Over 8,000 aircraft carrying over 1.2 million people are in the air at any given time. Machine learning for predictive maintenance can be performed efficiently during the flight with computational storage to increase safety and reduce turnaround time.
Cloud providers are also experimenting with computational cloud drives and will soon start to shift to commercial deployment. Besides helping offload tasks from more powerful application processors, computational drives could enhance security by running scans for malware and other threats locally.
Some might argue that the solution is obvious: reduce computing workloads! Companies collect far more data than they use anyway.
That approach, however, ignores one of the unfortunate truths about the digital world. We don’t know what data we need until we already have it. The only realistic choice is devising ways to process the massive data onslaught coming our way in an efficient manner. Computational drives will be a critical linchpin in letting us filter through the data without getting bogged down by the details. Insights generated from this data can unlock capabilities and use-cases that can transform entire industries.
Mohamed Awad is vice president of IoT and embedded at Arm.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!