Principles of Data Governance: Concepts, Frameworks, and Best Practices
Data governance includes the policies and procedures that dictate how data is created, processed, and distributed. And with the massive amount of data being generated by new technologies like IoT devices, AI, and AR/VR, data governance is no small job.
So far, research indicates that organizations have plenty of room for improvement. In an IDC report covering North America, Europe, and China, respondents estimated that:
- Their companies collected only 56 percent of the data available through their operations.
- 43% of the data their company collected went unleveraged.
But let’s return to the challenges of data governance after first reviewing the fundamentals.
Data governance is the set of overarching policies, processes, standards, and metrics used to ensure data is created, processed, distributed, and used efficiently and effectively according to an organization’s goals. Gartner puts it another way: Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption, and control of data and analytics.
You’ve probably heard that “data is the new oil.” But on further inspection, this statement is a bit misleading. Data is valuable. But until it’s transported, processed, and distributed, data is ultimately just raw material. Most of the data’s value emerges when it’s processed and distributed to the right people, at the right time, in the right form, in a transparent and legal process. “Unrefined” data can be incredibly costly because it can cause poor decision-making, legal problems, and unnecessary work. Data governance helps organizations extract more value from their data at a lower cost. For example, one common and fundamental goal of data governance is to establish uniformity among different datasets. By establishing uniformity, businesses can 1) avoid making decisions based on unreliable data and 2) cut down the time it takes to make good, data-driven decisions. In short, data governance matters because it helps organizations create more value from their data.
CIO Senior Writer Thor Olasvsrud explains that data governance is “A function that supports an organization’s overarching data management strategy.” He cites the Data Management Association’s (DAMA) wheel model of data management, with data governance as the hub. The spokes emanating from that hub are: Though many look at these “spokes” as individual initiatives, the fact is that if developed in siloes, they’re likely to fail. Data governance works best when approached as a whole and tied closely with your data integration strategy.
Data governance is as much a human problem as it is a technical problem. One of the primary drivers of the need for data governance is the proliferation of so many different individuals, teams, and departments using so many different tools in so many different ways. This compounds many of the technical data engineering problems. Lots of different types of data are stored in different places. That data has to be integrated in a way that businesses can use it for their current needs. But those “current needs” rarely stay current; data governance must make room for when new data use cases are discovered. It’s up to data governance teams to standardize the way tools are used to create, process, store, analyze, and share data. This is a big task, so here are a few best practices to make your efforts more manageable and effective: The first step in creating a data governance framework is to organize your people. To see how you might structure roles and responsibilities for data governance, McKinsey has a useful best-practice data governance organizational model shown in the image below. If data governance is architecture, then data management is construction. Despite this distinction, many people get data management and data governance confused. But it’s useful to understand this difference because when they combine effectively, the result is more valuable data at a lower cost. Data quality is a fundamental impetus of data governance. Without quality, data cannot be available, usable, or secure. And one of the most difficult places to ensure data quality is at the point of ingestion. When leveraging metadata from multiple solutions your data governance picture gets more complete. With StreamSets, you can avoid the data corrosion and data loss that so often occurs when data is ingested. Among other things, StreamSets gives you the ability to inspect data in motion and automatically detect and respond to schema changes. StreamSets smart data pipelines produce vital metadata that can be used by governance solutions to monitor the use of data from inception to analytics.What Is Data Governance?
Why Data Governance Matters
How Data Governance Works
The Challenges With Data Governance
Overcoming challenges with data governance best practices
A Note on Data Governance and Data Management
Choosing the Tools That Empower Data Governance