skip to Main Content

The DataOps Blog

Where Change Is Welcome

Introducing StreamSets DPM – Operational Control of Your Data in Motion

By Posted in Industry September 12, 2016

Friends of StreamSets,

Today I am delighted to announce our new product, StreamSets Dataflow Performance Manager, or DPM, the industry’s first solution for managing operations of a company’s end-to-end dataflows within a single pane of glass. The result of a year’s worth of innovative engineering and collaboration with key customers, DPM will be generally available on or before September 27, in time for Strata. We invite you to come by our booth (#451) for a live demonstration.

DPM is a natural follow-on to our first product, StreamSets Data Collector, which is open source software for building and deploying any-to-any dataflow pipelines. That product has enjoyed a great deal of success in its first year in market, with an accelerating number of weekly downloads, which now total in the tens of thousands across hundreds of enterprises, and numerous production use cases in Fortune 500 companies across a variety of industries.

While StreamSets Data Collector is a best-in-class tool for data engineers designing complex pipelines in the face of data drift, that is only half the battle.  Our customers don’t just struggle with building pipelines, but also with managing the day in and day out operations of their dataflows, so that they can be confident that data-driven applications and business processes are getting timely and trustworthy data.

This is where DPM comes in. You can think of it as a control panel for managing all of your dataflow topologies from a single point. A topology is a series of interconnected dataflows, sometimes dozens or hundreds of individual pipelines, that work together to continuously serve data in support of  business and IT imperatives, such as Customer 360, Cybersecurity, IOT and Data Lakes.

We call the operational discipline that DPM enables Data Performance Management. We think of DPM as a high level of process maturity akin to that delivered by Network Performance Management and Application Performance Management. Data has been neglected in this regard. While data stores are well-managed (data at rest), end-to-end dataflows are not (data in motion).  If you don’t professionally manage your data in motion, you risk having applications malfunction because of incomplete or corrupt  data.   

DPM lets you map, measure and master your dataflow operations.  First, DPM maps dataflow pipelines into broader topologies, not just as a snapshot but as a living, breathing and interactive data architecture.  Real-time visualization of dataflow topologies is truly empowering, replacing manual mapping exercises that become outdated as soon as they are published.  

Data Performance Manager - Screenshot:MapStreamSets Dataflow Performance Manager™ maps the dataflows for a Customer 360 topology, which feeds batch and streaming data from multiple sources to multiple destinations. It also shows record throughput across the topology.

But that’s only the first step. Next you can measure dataflow performance across each topology, from end-to-end or point-to-point.  You can establish baselines for what is normal throughput, travel time or error rates, and then monitor these metrics to ensure operational stability.  You can also assess the performance impact of topology changes, such as new or updated infrastructure, applications and dataflows.

Data Performance Manager - Screenshot:MeasureThe StreamSets DPM dashboard shows metrics for all dataflow topologies on a single screen.  Operators can drill into each topology or the dataflow pipelines they use.

Still, the ultimate goal to which enterprises should strive is to master their dataflow operations by implementing Data SLAs that ensure incoming data meets business requirements for availability and accuracy.  DPM lets you set Data SLAs and then warn or alert when there is a violation so you can proactively address dataflow operational problems before they become business problems.

The power in these Data SLAs is that they are not limited to system-specific rules, like “is there data backpressure in Kafka” but rather reflect consumption-specific business goals, such as “is more than 95% of the application log data feeding my personalization algorithm arriving within 1 hour of being produced?” or “is the data in my BI dashboard complete; can I trust the results?” Of course you can also set SLAs for path segments or systems that create risk within a given topology, but the true innovation is the end-to-end visibility and control you gain.

Data Performance Manager - Screenshot:MasterStreamSets DPM allows you to set SLAs for Data Availability and Data Accuracy, from end-to-end or for a segment of the dataflow.  SLAs can be programmed to  trigger alerts when violated.

With DPM, we feel we have made great progress in our mission to empower organizations to harness their data in motion.  And there is much more to come.  

Girish Pancha, CEO and Co-Founder

StreamSets Inc.

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy