In this special guest feature, Lewis Carr, Senior Director of Marketing at Actian, expands on his definition of data fabric and why this technology is quickly becoming a necessity for modern enterprises. In his role, Lewis leads product management, marketing and solutions strategies and execution. Lewis has extensive experience in Cloud, Big Data Analytics, IoT, Mobility and Security, as well as a background in original content development and diverse team management. He is an individual contributor and manager in engineering, pre-sales, business development, and most areas of marketing targeted at Enterprise, Government, OEM, and embedded marketplaces.
Data fabrics got their start back in the mid-2000s when computing started to spread from data centers into the cloud. They became more popular as organizations embraced hybrid cloud, and today data fabrics are helping to reduce complexities involving data streams moving to and from the network’s edge. Data use is now exploding across multiple platforms, and organizations desperately need a framework to manage it – to move, secure, prepare, govern, and integrate data into IT systems.
Data fabrics provide that framework, serving as both the translator and the plumbing for data in all its forms, wherever it sits and wherever it needs to go, regardless of whether the data consumer is a human or machine.
However, the impacts of IoT and edge devices on data fabric technology can’t be overstated. The devices themselves that operate at the edge are getting more varied and more complex. At any time, organization could be gathering up, processing and gleaning insights from thousands of IoT devices, smart sensors and edge routers located all over the world. The devices could be controlling process flows in a chemical plant, collecting video feeds from a security booth or identifying the precise location of a shipping container. So much information in so many formats from so many places requires compute power, multiple stages of translation, bandwidth, and an understanding of where, when, and how to process and analyze the data within the edge.
Because of all the complexity at the edge, organizations must determine which pieces of the processing are done at which level. There’s an application for each, and for each application and process state it’s being applied to, there’s a manipulation. For each manipulation, there’s processing of data and memory management with different combinations of data, algorithms, and desired output. Oftentimes, there are multiple stages of data processing, with pre-processing on-device and further processing at the gateway for governance and optimization of the data that’s sent back to the cloud. The point of a data fabric is to handle all the complexity. As more data gets created and analyzed at the edge, data fabrics will evolve further into what could be referred to as a more specialized edge data fabric.
Edge data fabric’s common elements
The edge is quickly becoming a new cloud, leveraging the same cloud technologies and standards in combination with new, edge-specific networks such as 5G and WLAN 6. Like the core cloud, there are richer, more intelligent applications running on each device and on gateways. To handle the growing number of data requirements edge devices pose, an edge data fabric has to perform several important and necessary functions, including:
- Access many different interfaces – This includes http, mttp, radio networks and manufacturing networks. There are sets of interfaces that are predominate in different areas of the Edge with HTTP being the standard for Enterprise IT, MTTP quickly becoming a de facto standard for IoT, Radio networks for Service Providers and Manufacturing networks like CAN-BUS favored by various sub-verticals in manufacturing. The most expedient workaround is ETL between devices on disparate networks.
- Run on multiple operating environments – Most importantly, be POSIX compliant. Virtually all embedded operating environments are now Linux-based, even those from Microsoft. But even with this underlying common OS, there are still differences in file formats, ports used, APIs, and more. Data management and processing must create an OS abstraction later for data processing and analytics.
- Work with key protocols and APIs – This includes recent APIs with REST API and JSON data payloads. There are several programming and interpretive languages used in the embedded space that often maps to generations of developers and designers. The ability to use common APIs across programming and integration environments creates a developer-centric abstraction layer for data processing and analytics. JSON with REST API is the latest example of this, where the benefit of this combination is it extends the abstraction layer to semi-structured and large BLO (Binary Large Object, for example, video data and its associated metadata).
- Provide JDBC/ODBC database connectivity: For legacy applications and a quick, seamless connection between databases requires not just adoption of new protocols and APIs but steadfast adherence to commonly used standards for connectivity to legacy applications. For example, I may need to leverage JSON and REST API to pull certain windows of time-series data that may be semi-structured, but I may also need to communicate with disparate applications and data repositories in the cloud to get additional information as to which window of time-series data is of interest and what to do with it locally.
- Handle streaming data – Through standards such as Spark and Kafka because, in many cases, the data processing and analytics will need to be performed in real-time as a data stream and ongoing for some period of time. Spark, for example, serves as a key element of a data fabric in the cloud, supporting streaming data between various cloud platforms from different vendors.
A tipping point for edge data fabric
While edge computing’s origins date back to content delivery networks (CDNs) in the 1990s, it’s starting to reach a market tipping point for an edge data fabric. The key drivers for edge computing have changed and for us to truly harness all this intelligence and all this processing being done at the edge, we will have to shed the client-server mentality. The days of single-location data centralization are gone and the majority of data is going to stay at the edge.
As you get more intelligence at the edge, automated routines will correspondingly increase. The business will direct data scientist, engineers, and developers to directly embed policy around automation and directions for what should be done by the exception handling routines, which will be iterated, continually diminishing the set of manual processes. Increasingly, this will be done through machine learning (ML) tied to whatever policy and business or operational rules automate the exception handling process. That ML has to run in an unsupervised fashion at the edge and the edge data fabric will be central to making this possible.
Potential use cases for edge data fabric
An edge data fabric will provide support for open communities to build application functionality into what were previously closed networks and systems. These could include equipping 5G Wireless Networks with Multi-Access Edge Computing (MEC) platforms to open the network to third-party developers and integrators to build CDNs. An edge data fabric also could unlock opportunities for a multi-layer IoT grid – with PLCs on one layer, machine vision on another and robotics on yet another layer – to share data between these layers. Third-party vendors will need to design and productize such a grid; an edge data fabric will be necessary to distribute and manipulate data across these device layers and to and from the gateways/networks edge and the cloud. The market will drive when and where this happens. The roll-out of 5G, the integration of new infrastructure from cell towers and mobile switching centers to software to glue it all together, even the phase out of 3G, will require a massive investment of upfront capital. There must be more of a business case than selling more 5G-capable smartphones.
For example, think about streaming Netflix or HBOMax on your mobile phone. You may think that 5G would make this easier and on the surface, this is true. However, when you take into consideration that everyone will be watching streaming video, you quickly run into latency and bandwidth management problems – especially if the old model of all the content residing in a central location somewhere in the cloud is at play. Instead, you will need to predict which content will be seen by which demographic within your subscriber base and when.
In these scenarios, it’s not as much about the content itself as it is about the metadata associated with the content, and the subscribers that must be analyzed to determine where to cache that content locally and what compute, storage, and network load it represents. This metadata resides in customer data records, social media streams, and IT operations monitoring and management systems that estimate capacity, use, and quality of service.
New applications and microservices will need to be developed to stitch together new service delivery that supports not just this business case, but similar ones that are currently in back-of-the-envelope discussions – namely, AR/VR for richer retail store experiences or even fully autonomous cars and trucks. Intraplatform interoperability, the ability to easily expose and share data between cloud platforms and on-premises environments, compute resources, network bandwidth and latency are factors in the cloud, but they are far more challenging at the edge. Part of the reason why public cloud platforms have rapidly adopted data fabric and open standards have been to quickly and thoroughly address these challenges. MEC and the right data platform embedded in it can do the same at the edge. Further, an edge data fabric that is simply an extension of cloud data fabric will accelerate its adoption, bringing along with it the entire development and IT community from the cloud to the edge.