As cloud adoption grows, so does the complexity of the data architectures that serve as the backbone for modern enterprise applications and the need to enable data integration for hybrid cloud. This complexity, if not planned for, can cripple any cloud initiative. According to research firm Gartner (subscription required), by 2021, at least 75% of large and global organizations will implement a multi-cloud capable hybrid integration platform, up from less than 25% in 2018. Taking a DataOps approach to methods of data integration can help streamline how data is moved around the business and ensure integration initiatives support the cloud-oriented goals of any organization.
Data Integration for Hybrid Cloud with DataOps with DataOps
Simply put, DataOps streamlines data integration practices to interconnect modern data systems with agility, enabling the free movement of data in support of enabling pervasive intelligence. As it turns out, a disciplined approach to data integration can bring success of any cloud initiative, as it helps alleviate a number of challenges, such as frequent changes to cloud data platforms, managing data between on-premises and cloud, and gaining operational visibility to ensure dataflow performance as well as regulatory compliance for sensitive data.
With this announcement, we support data integration for hybrid cloud with the following capabilities:
- A full-featured dataflow designer that includes “easy button” connectors for Amazon S3, Elastic MapReduce (EMR) and RedShift; Azure Data Lake Storage, HDInsight and Azure Databricks; Google DataProc and Snowflake
- Elastic scaling of cloud, multi-cloud and reverse hybrid cloud dataflows via Kubernetes
- New data drift handling, which automatically reflects updates to source schema in Amazon Athena, Azure SQL and Google BigQuery cloud data services
- A new CI/CD framework for automating frequent changes to dataflows through iterative design, test, validate and deployment steps
- New central governance of StreamSets Data Protector policies that detect and deal with sensitive data such as PII and PHI
Roughly half of our customers already use StreamSets in the cloud, and we know there’s only one direction that number will go. In a recent RightScale survey, 81% of respondents indicated they plan to take a multi-cloud approach to their cloud adoption strategy. There are a few factors to consider given this fact: first, there’s a strong chance that multi-cloud strategy implies multiple providers. Second, while cloud-first organizations do exist and an all-cloud strategy may be something to aspire to, most organizations will likely land on a hybrid architecture for the foreseeable future, with some applications remaining on-premises. Finally, the types of workloads executed either on-prem or in the cloud will vary, so having the flexibility to mix and match, that is, optionality, is key. In short, the ability to leverage multiple cloud providers and run a mix of workloads either on-prem or one of multiple cloud choices can make or break any cloud initiative.
The StreamSets Platform simplifies how to build, execute, operate and protect enterprise data movement architectures. Built on an open source core product called StreamSets Data Collector™ with well over 2 million downloads, the platform allows developers to design pipelines with a minimum of code, and operators to aggregate numerous dataflows into dataflow topologies, managing them centrally with live end-to-end visibility, SLA-based performance and in-stream data protection.