Data Fabric: A Technical and Business Exploration
With the rise of technologies such as artificial intelligence (AI), edge computing, machine learning (ML), the internet of things (IoT), today’s technological landscape resembles that of a science fiction novel from only a decade ago. And while technology continues to reshape our daily lives, so does its influence reshape today’s business landscape.
This widespread adoption of technology to improve business processes—known as the digital transformation—has shifted today’s business to rely heavily on gaining meaningful insight from business data. Integrating meaningful technologies to manage data, metadata and foster business intelligence has become the name of the game. And although each business’s technology stack may differ, a common theme emerges—these tech stacks are made up of point solutions each operating independently within a single use case.
Here arises the challenge, because while these point solutions deliver incredible business value in their particular domains, without an intelligent, connected management layer, each solution becomes siloed. This siloling eventually effectively stiffles communication, flexibility, scale, and innovation. This is where data fabric comes in.
Data Fabric Definition
Data fabric is a data architecture that unifies disparate operating endpoints wherever they may reside whether in the cloud, hybrid, on-prem, or edge. These disparate endpoints can span a wide variety of technologies and support tooling for Internet of Things (IoT), artificial intelligence (AI), machine learning (ML) and others. Data fabric is the flexible, secure and intelligent unifying management layer that binds them all together. Furthermore, with the help of successful metadata management and a combination of human and machine learning data fabric can continuously identify, connect, and transform real-time data from different applications.
This architectural abstraction comes as a response to the growing sophistication and complexity of today’s business ecosystem. Until recent years, the disparate siloed enterprise environment hasn’t only been acceptable, it’s been the status quo. With data fabric, organizations can implement new point solutions in a more meaningful way that maximizes connectivity.
The Underlying Tech Components of Data Fabric
With data fabric’s goal of creating a centralized, unified network the underlying tech of data fabric must also be centralized. Popular landing places for data fabrics include both data warehouses and data lakes. There are data fabric vendors that advocate their ability to surface insights from data fabrics, but it must exist somewhere before they can be analyzed. Data fabric infrastructure, whatever the choice, must be fast, scalable and available across an entire organization.
Data Fabric Examples
A very typical data fabric example, involves creating a system of record across an entire organization of key information on all data assets for the purposes of data compliance. This key information could include classifications like country of record, opt-in preferences, and risk profiles from every system in use. All of which could be used to comply with ever changing data regulations and avoid costly fines. Retrieving and maintaining all of this information requires data pipelines with near-real time streaming capability. Data fabric does not create itself.
Another example might be one in which a data science team creates a data fabric to knit together various data sources for the purpose of ML/AI in an effort to discover the ideal customer profile. Example data sources for ML/AI projects might include internal proprietary systems, SaaS solutions for marketing, sales, support, and finally revenue recognition to create an interrelated data story of a customer from start to finish. To be successful, this effort requires a huge amount of data to train models and the best way to ensure both volume and timeliness is to create robust data pipelines that scale. In this example, and in all others, data fabric’s most important resource is data itself.
How Data Fabric Makes its Business Case
Although a data fabric’s seamlessly connected, unified data management layer may seem like a ‘nice to have,’ in reality, data fabric is one of the most important data architectures for tomorrow’s organizations. Data fabric delivers on two major value propositions for tomorrow’s business—data governance and actionable data.
Data governance makes use of processes, standards, and user roles to track and govern data across an organization’s entire business landscape. For today’s modern business this is critical for two reasons:
- As organizations grow, more and more sensitive data is shared between applications, services and business processes. Without proper data governance practices in place, data can become unaccounted for resulting in inefficient business practices, lost opportunity, and data bloat.
- Possibly even more impactful is data compliance. As initiatives such as CCPA and GDPR ramp up sensitive data that is mishandled, lost, or otherwise unaccounted for can mean substantial fines for organizations.
The product of successful data fabric is actionable data. At its heart, actionable data is integrated data. It is ready and available for analytics, for machine learning, for visualization or any other purpose envisioned now and for the future for an enterprise. Organizations without a data fabric are at risk for data siloing, poor data integrity, limited collaboration, and even security risks.
In other words, without a connective integration layer between applications and services, organizations are likely to miss the incredible opportunity of managing an array of applications under a unified platform.
Achieving a Modern Data Fabric
The concept of creating a unified business ecosystem isn’t a new concept by any stretch of the imagination. In fact, businesses have continually aimed to innovate and use technology to gain more meaningful business insight. The major difference between initiatives throughout the years to access the right data at the right time to make meaningful business decisions and data fabric is data fabric’s reliance on metadata.
A data fabric initiative should be thought of as an effort to contextualize information spanning the entire enterprise ecosystem. To achieve this, the magic is in the metadata.
Metadata is simply data about data. And the more sophisticated the metadata, the more meaningful inferences we can make about key business operations. To this, the foundation of any data fabric initiative should be a well-managed pool of metadata configured so that data fabric can be used to identify, connect and analyze an array of data characteristics about any given endpoint. This identification lays the foundation for a meaningful data fabric layer.
At the end of the day, it’s this high-level business modeling that organizations are after. For executive leadership looking to gain key stakeholder buy-in for a data fabric project, this is where to start.
Data Fabric Creation
Identifying the importance of metadata is the first step, but unquestionably the more difficult second step is to collect metadata from systems across an entire organization and begin the process of stitching everything together. Data integration is this process. There are quite a lot of methods to achieve this goal. Some of the most straightforward methods are low code solutions that make the process of data fabric creation as streamlined as possible.
Data Mesh vs. Data Fabric: Different Approaches, Similar Problem
Data mesh is an enterprise data management strategy that partitions large business data into subdomains. While data fabric makes aspires to an unified data network across an organization, data mesh is a decentralized strategy that relies on domain experts to make meaningful insights on data respective to a particular subdomain. A subdomain can be thought of as a standalone ecosystem of microservices, point solutions, code, or workflows that together deliver a particular business solution.
By creating these partitions, organizations can incentivize domain experts to deliver insights on what they know best: the data from their domain. Taken together, each subdomain should deliver a greater holistic picture of the business where executive leadership can make more informed business-level decisions. Data integration once again becomes the vehicle for success as this thousand foot view is achieved by weaving together these disparate sources into consumable views.
StreamSets and Data Fabric
Gartner noted in their Top 10 Data and Analytics Trends for 2021, that data fabric is the foundation of the modern data management platform. At StreamSets, we feel the same way and have dedicated our development efforts to bringing innovative data management solutions to the market that transform how organizations access and use data.
To build on developing a successful data fabric strategy, StreamSets prioritizes making managed data available through an open metadata sharing model. Metadata produced through StreamSets smart data pipelines is made available for data fabric initiatives such that you can collate platform metrics, data schemas and transform logic automatically. This data can then be analyzed in-house or sent to tools like Collibra and Apache Atlas.