The big data world was shocked last week when Apache Hadoop™ data vendors Cloudera and Hortonworks announced they would be merging. Anyone familiar with this space knows that these two vendors have hardly been friends, so suffice to say this announcement caught everyone by surprise. Much has already been documented about this merger and many are speculating about the ultimate motivations for it.
For some, this merger makes total sense and eliminates competition where it’s not needed in the data vendor space. For others, it’s a win for open source as the combined entity removes friction present due to the fact that both companies historically have worked on competing and complementary open source projects. And for yet others, it’s entirely financially motivated, and helps on a path to profitability for the combined entity. Whatever the motivations are, one thing is a given: expect a lot of uncertainty and change in the Hadoop space over the next several quarters.
Don’t Fear the Change When Data Vendors Merge
StreamSets is often used to help simplify the process of modernizing data architectures using platforms like Hadoop. Our focus in bringing DevOps principles to data integration — what we call DataOps — is helping companies world wide solve their most challenging data management problems regardless of which data vendor they use. In fact, for data lake replatforming, we have customers running the StreamSets DataOps Platform for ingestion with both Cloudera and Hortonworks, as well as with MapR. Our commitment to breadth of connectivity has allowed this to happen, as we offer native support for leading Hadoop components such as HDFS, Kudu, Cloudera Navigator, and Apache Atlas. So regardless of your choice of data vendors now and in the future, you can be certain that StreamSets will support your use cases and workloads.
But ingestion is just one part of what we do. StreamSets enables you to build any-to-any data integration pipelines for ingesting data between leading systems like Teradata, Exadata and more. These same pipelines benefit from built in dataflow sensors, making them uniquely intelligent. Most notably, StreamSets pipelines are insulated from infrastructure drift, meaning as the underlying technologies that support your data architecture change, they are insulated from that change and will continue to operate as before. The added benefit is you also get all the additional capabilities of the StreamSets DataOps Platform for your architecture beyond just Hadoop. This includes our capabilities for automation, data drift awareness, operational visibility and in-motion data protection.
Cloudera? Hortonworks? Your Data Vendor Choice Doesn’t Matter.
So what does this all mean in the context of the Cloudera + Hortonworks merger? There’s no doubt there will be a lot of uncertainty over the course of the next few months as to what the joint product offering of the merged company will look like. Using StreamSets, however, means whatever data vendors you are using today won’t really be impacted by what you may be required to use tomorrow. It’s difficult to say what the new merged platform will look like but fortunately, StreamSets pipelines can be easily updated to the newest components, without having to go through long drawn out development cycles. As a result, you can confidently make technology decisions today, without fear of making the wrong choice.
The future of data infrastructure is marked with innovations and constant flux so one thing is certain: Cloudera and Hortonworks merging won’t be the last change you can expect to happen. The data management landscape has historically been ripe with mergers and acquisitions so data architectures are likely to be marked with even more drift. Taking a DataOps approach now allows you to operate with the same agility in data infrastructures as DevOps has brought to the world of applications. The StreamSets DataOps Platform is ideal for helping protect your data vendor investment today and into the future, regardless of what happens.