Faster data access and easier collaboration among data teams are two key factors that help drive productivity for most data-driven organizations. However, achieving this becomes more complex with the exponential growth of data as business needs grow. One way to combat this is with architectural patterns that support effective data management.
- The Concepts of Data Mesh and Data Fabric
- Data Mesh Vs. Data Fabric
- How Data Mesh Compliments Data Fabric and Vice-Versa
- StreamSets, Data Mesh, and Data Fabric
Data mesh and data fabric are two approaches to building a data architecture. Although the methods differ in operation and storage, both aim to address common challenges like data silos, lack of easy access to organizational data, and data management.
This piece will explore a data mesh vs data fabric approach to data architecture – how they are similar, their differences, and how organizations can leverage both for a more robust data architecture.
The Concepts of Data Mesh and Data Fabric
What is Data Mesh?
A data mesh utilizes a human and product-centric approach to help solve the challenges brought by the heterogeneous nature of data sources today. With traditional data storage methods like data lakes and warehouses, centralizing and consolidating all data sources into one location can become an issue when looking to obtain quick insights that help with decision-making. To use data, workers face the time-consuming hassle of diving into a centralized data storage, where much of the data may be irrelevant to their needs. Additionally, data cleaning and processing may require extensive technical knowledge, which line of business users often lack and can cause friction in the process. Enter data mesh.
Data mesh refers to the decentralized and domain-specific approach to data architecture that promotes product agility and leaves data producers and consumers with more power to access and harness the total value of organization data. Data mesh follows these four core principles:
- Domain-specific orientation: Instead of using a single virtual layer to manage disparate sources, a data mesh uses multiple domains, with each domain specific to the needs/purpose of the domain. A domain is an independent collection of deployable clusters containing multiple microservices that interact with users or other domains through interfaces. For example, instead of a single domain consisting of sales, finance, human resources, and logistics unified data, the data mesh instead creates four separate domains, each catering to the sales, finance, HR, and logistics needs. In this way, data becomes closer to those that need it and hastens time from access to use.
- Data as a product: From a data mesh perspective, data is a product, with producers and consumers along the workflow. Hence, through every stage of the data workflow, every tool/practice is geared towards making data easily accessible and usable by end users or other domains. The resulting data product is usually a controlled dataset, which acts like an API and is accessible by different domains and the public.
- Utilizes a self-serving infrastructure: This infrastructure is created and maintained by the data engineering team that possesses the technical skills and helps ensure easy operations by members of the domain. The existence of a self-service platform makes operations proceed faster, without the need for knowledge of complex tools. Hence, domain owners can use data faster.
- A federated governance approach utilizes a bottom-up approach to data governance: By federation, members of each domain come together and construct a guide that defines the rules regarding policies, access control, and data movement in the data mesh. This approach weaves in governance checks throughout data workflows to ensure data quality and helps ensure adherence to industry standards and compliance.
What is Data Fabric?
Data Fabric is a glue for multiple disparate systems looking to unify and get more value from their data. As an organizations data increases exponentially, the need for a centralized and single source of truth system to manage multiple data systems under a single repository proves the need for data fabrics. Data Fabrics’ approach to data architecture involves a cohesive and metadata-driven method that aims to connect disparate data sources under a single virtual layer that helps ease governance and promotes access and integration.
Metadata is the foundation of every well-laid data fabric architecture. Metadata provides information about the data present in an organization and helps establish the flow of data within an organization. Hence, organizations with well-defined metadata and practice proper metadata management can identify and make connections between various data endpoints, giving rise to meaningful insights.
In addition, data fabrics are cloud and platform agnostic, which makes integrations across multiple cloud platforms like Azure, Google, and AWS seamless.
Data Mesh Vs. Data Fabric
Data mesh and data fabrics are data architecture approaches organizations adopt for building a scalable, easily accessible, and better-managed data system. It is important to note that there is no one single vendor who can provide a data fabric or data mesh. Both approaches serve to address the following data management principles:
- Fast and easy data access for those with and without technical expertise
- The exponential growth rate of data
- Heterogeneous nature of data sources
- Effective data management and governance across data workflow
The Differences Between Data Mesh and Data Fabric
Data mesh and data fabric differ in their approach to handling data, storage mechanism, and data governance.
People-centric vs. Product Approach
A data mesh views data as a product with consumers who access this data for use in other domains, or for the business to create value, at the end of the process. Hence, at every step of a data mesh approach, the goal is to reduce friction to data access and make access possible no matter the technical expertise. On the other hand, a data fabric approach to data architecture uses an automated approach with multiple tools and technologies, aiming to connect data across various locations and draw insights from the connections.
Centralized vs. Decentralized
While data fabric governs and manages multiple data sources from a single, virtual centralized system, a data mesh follows the opposite approach. A data mesh creates multiple domain-specific systems, each specialized according to its functions and uses, thus bringing data closer to consumers. A data fabric consists of a single source of truth containing high-speed clusters that grant users access via network endpoints.
For data mesh, data is accessible via a controlled dataset. This practice is unlike that of data fabrics, whereby data is made available through objective-based APIs or Software Development Kits (SDKs).
Data fabric employs a technology-driven metadata approach that leverages tools and technology stacks to make connections between data sources and make them available to end users via delivery systems. Data mesh involves an organizational and distributed pattern that creates multiple smaller, specialized domains within an organization.
For a data mesh, data governance involves input from every domain, promoting a democratic-like approach that considers policy rules and guidelines at each domain and enforcing these policies along the workflow. However, for data fabric, data governance typically follows a top-down approach where the highest authority sets and enforces the data policy guidelines.
How Data Mesh Compliments Data Fabric and Vice-Versa
Organizations can leverage the automated capabilities provided by data fabric and implement them in various stages of the data mesh.
For instance, organizations can bring their machine learning applications closer to their end users by automating the data preparation stages of the ML process, thereby improving the speed and accuracy of models and making models available for consumption via the controlled datasets.
Thus, organizations get the best from implementing both approaches. In this way, domain users know how best to use final models instead of leaving it solely in the hands of data engineers, who may not have an in-depth knowledge of the use case of data.
StreamSets, Data Mesh, and Data Fabric
Organizations can implement either or both architectural approaches when building their data architecture. StreamSets provides a platform with a fully open metadata framework and is extensible through API’s. This is critical for orchestrating the consistency, visibility, and level of automation that data fabrics and data meshes require.
Hence, organizations can build intelligent data pipelines based on their preferred transformations and use these pipelines for use in their businesses. Learn more about how StreamSets can help you implement data fabric and mesh architectural patterns here.