If you’d had a crystal ball this time last year, what would you have done differently? In the area of data integration, many of the changes that have taken place are accelerations of trends that were already underway. Even ahead of the crystal-clear proof of the changes needed, Gartner’s advice on data integration has been evolving to lead data and analytics technical professionals into the future.
With this blog, I reflect back on key themes in the Gartner research, as well as my own conversations with customers, and industry observations to identify 3 big trends in data integration. Overall, data and analytics leaders are called upon to deliver more, faster than ever, and that has a real impact on the work of data engineers.
“In the face of unprecedented market shifts, data and analytics leaders require an ever-increasing velocity and scale of analysis in terms of processing and access to accelerate innovation and forge new paths to a post-COVID-19 world,” Rita Sallam, Distinguished VP Analyst, shared during her presentation at virtual Gartner IT Symposium/Xpo™ 2020.
3 important trends in data integration are:
1. Modernizing Data Architectures for Real-time Data
Relatively recent innovations like automation, machine learning (ML), and predictive analytics are becoming imperative to success in today’s fast-moving world. They require the ability to seamlessly and continuously harness data, from anywhere, in any format, and deliver it to decisioning systems used by data scientists and business analysts. Legacy data architectures, with their tendency to be on-premises, siloed, and pre-integrated (i.e., monolithic), are not built for this type of flexibility and agility.
Modern data architectures are cloud-based, use modular components that seamlessly integrate, and are purpose built to solve core business problems. Since data is the foundation of today’s modern business, modern data architectures include robust and scalable streaming data pipeline solutions to deliver real-time, continuous data to the business. Data integration has to support continuous integration, continuous development, and continuous data as well as continuous innovation.
Examples of Modern Data Integration
As we all found out the hard way, it’s not just businesses that need real-time data. Getting real-time, reliable information from the government in a time of crisis can provide a sense of stability along with the foundation to make urgent decisions. But if enterprises are known for legacy systems, governments are downright notorious for them.
When the pandemic hit, the State of Ohio needed to get a COVID dashboard together with 88 counties and 1000s of data sources. Luckily, their data platform team, in partnership with Avaap, had already enabled the various State agencies with a modern data pipeline platform. When they got their ‘at bat,’ they were able to pull together the State of Ohio COVID dashboard overnight.
More Resources on Modernizing Data Architectures
For more on modernizing data architectures and real-time data, this Gartner Report may be of interest: An Introduction to and Evaluation of Apache Spark for Modern Data Architectures [requires login]
2. Emergence and Adoption of DevOps Practices: DataOps, AIOps, MLOps, XOps…
Remember when every business was a software business? Out of that world came DevOps, a widely adopted and well-understood practice that accelerates software development by leveraging automation and monitoring to enable agile collaboration across application designers and operations staff.
Over the last decade, every business has become a data business. And as data ‘ate the world,’ organizations realized they needed to bring the agility and flexibility of their DevOps practices to data. As with software, the newly born practice of DataOps faced the challenge of rapidly and repeatedly delivering quality. To do this, they faced the additional challenge of working with a moving target: data sources, destinations, structure, and semantics that are constantly changing. These unending and often unexpected changes (data drift) are what make data integration so tricky and a big part of why the adoption of DataOps has grown.
In a December 2020 Gartner blog post, How DataOps Amplifies Data & Analytics Business Value, Gartner shared that “The pandemic has accelerated the need for data and analytics leaders to deliver data and analytics insight faster, with higher quality and resiliency in the face of constant change…As a result, data and analytics leaders are increasingly applying DataOps techniques that provide a more agile and collaborative approach to building and managing data pipelines.” [Bolding mine.]
Using DataOps as a Foundation for MLOps and AI
Other groups within organizations, for example data scientists, are applying DevOps principles to machine learning with a practice of MLOps. A strong DataOps practice can expedite operations at some of the most crucial stages of the machine learning lifecycle and MLOps. In machine learning, data scientists want to create and rapidly iterate on machine learning models, which requires the continuous delivery and continuous integration of changing data that’s provided by a strong foundation of DataOps.
In essence, the traditional way of doing data integration—in batches, from silos when possible—is giving way to this more operational continuous delivery of constantly changing data. For a perfect example of DataOps in motion, check out this case study on how Shell uses DataOps to deliver AI at enterprise scale. Shell created a DataOps Center of Excellence (COE) so their data scientists would be able to innovate and model with all of their data. They enable self-service at scale. The data scientists who want to do exploratory data science and machine learning can access data without the traditional data integration change management work.
More Resources on DevOps and AI
For more on this topic, check out the following Gartner Report: Assessing DevOps in Artificial Intelligence Initiatives [requires login]
3. Cloud, Cloud, and (Multi-) Cloud
In this year of acceleration, even the most traditional cloud holdouts—companies concerned with compliance, privacy, and legacy investments—have realized that modernizing with cloud technologies is essential for survival. Gartner research shows that organizations are increasingly using cloud services for new initiatives or to replace existing systems, reallocating spending from traditional IT solutions to cloud. They call this cloud shift. “Gartner’s cloud shift data reveals that enterprises are demonstrating continued preference for public cloud services compared with traditional non-cloud alternatives,” according to Ed Anderson, Distinguished VP Analyst, Gartner. “The proportion of IT spending that is being allocated to cloud will accelerate even further in the aftermath of the COVID-19 crisis, as companies look to improve operational efficiencies.”
As this “cloud-shift” occurs, organizations will need a data management strategy that works across legacy systems and multiple cloud environments. Whether migrating data to the cloud from legacy systems or ensuring continuous data integration between clouds, portability of data pipelines will be critical. In a recent Gartner survey of public cloud users, 81% of respondents said they are working with two or more cloud services providers.* In addition, cloud innovation is ever-present, with new technologies entering the scene regularly. Resilience in data integration is a must.
And if you need more proof of why cloud is a must, just check out Trend #6 in the October 2020 ‘Gartner Top Trends in Data and Analytics for 2020’: Cloud is a Given. According to this article, “By 2022, public cloud services will be essential for 90% of data and analytics innovation.”
More Resources on Cloud Data Integration and Data Management
For more on the cloud in data integration, check out this Gartner Report: *Understanding Cloud Data Management Architectures: Hybrid Cloud, Multicloud and Intercloud
The Reason It’s ‘The Year of the Data Engineer’
Companies have talked about digital transformation for a long time. The majority have at least started the process. And like last December’s ‘Christmas Star’ planet alignment, these 3 trends in data integration have aligned so perfectly and accelerated so quickly that organizations have the option to move boldly into the now or risk being left behind.
Fit for purpose solutions are no longer relegated to Gartner’s ‘cool vendors’ list; they’re recognized right alongside traditional enterprises with monolithic all-in-one tools offering everything from ETL to lineage to MDM. That’s because DataOps practices using modern data architectures to deliver continuous, resilient data across an organization are essential. It’s the year of the data engineer for a reason!