Girish is StreamSets' CEO. He is an industry veteran who has spent his career developing successful, innovative products that address the challenge of providing integrated information as a mission-critical, enterprise-wide solution.
Today we hear a lot about streaming data, fast data, and data in motion. But the truth is that we have always needed ways to move our data. Historically, the industry has been pretty inventive about getting this done. From the early days of data warehousing and extract, transform, and load (ETL) to now, we have continued to adapt and create new data movement methods, even as the characteristics of the data and data processing architectures have dramatically changed.
Exerting firm control over data in motion is a critical competency which has become core to modern data operations. Based on more than 20 years in enterprise data, here is my take on the past, present and future of data in motion.
Girish PanchaData in Motion Evolution: Where We’ve Been…Where We Need to Go
Today I am delighted to announce our new product, StreamSets Dataflow Performance Manager, or DPM, the industry’s first solution for managing operations of a company’s end-to-end dataflows within a single pane of glass. The result of a year’s worth of innovative engineering and collaboration with key customers, DPM will be generally available on or before September 27, in time for Strata. We invite you to come by our booth (#451) for a live demonstration.
DPM is a natural follow-on to our first product, StreamSets Data Collector, which is open source software for building and deploying any-to-any dataflow pipelines. That product has enjoyed a great deal of success in its first year in market, with an accelerating number of weekly downloads, which now total in the tens of thousands across hundreds of enterprises, and numerous production use cases in Fortune 500 companies across a variety of industries.
Girish PanchaIntroducing StreamSets DPM – Operational Control of Your Data in Motion
Forward-looking, data-driven enterprises increasingly leverage Big Data platforms, such as Hadoop, Elasticsearch and Amazon Web Services, to derive insights from non-transactional, machine-generated data. Many tools have emerged to power next generation data pipelines and provide specialized analytic capabilities. To get value from these technologies, data must reside in intermediate data stores in a consumable form. However, existing data integration tools do not offer the means to continuously extract data from the exploding variety of machine data sources and load into Big Data platforms in a consumption-ready manner.
Today, after a year of working in stealth mode with a number of enterprise charter customers, we are excited to launch StreamSets. Arvind and I started StreamSets in June 2014 because, as they say in French, “plus ça change, plus c’est la même chose.” Or in other words, the more things change, the more they stay the same.
Arvind had come to realize over his four year career at Cloudera that the best practice for most customers ingesting data into Hadoop was manually coding data processing logic and orchestrating them using open source frameworks. I was flabbergasted! As Chief Product Officer at Informatica, I had spent more than a dozen years at Informatica delivering various technologies that automated the processing and moving of data into data warehouses. So why were people doing this manually for big data stores?