Dataflow Performance Blog

Introducing StreamSets Data Protector

Detect, Secure and Govern Sensitive Data Upon Ingestion

StreamSets is excited to announce a new product for protecting data in motion. StreamSets Data Protector, as the name would imply, extends the value of the StreamSets DataOps Platform to enable users to detect, secure and govern sensitive data as it flows around your business. Leveraging StreamSets' unique “Dataflow Sensors”, Data Protector can spot and act on personally identifiable information (PII) at the point of ingest, further strengthening your ability to meet new and changing regulatory requirements such as the General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act (HIPAA).

Data protection is not a new problem per se. Countless companies and products exist today that help protect sensitive information. Each is valuable in its own right, however as the threat of a security breach continues to grow, and regulations become more restrictive, so too does the desire to extend data protection in new and innovative ways. From our discussion with global customers headquartered both here in the US and in Europe, it is clear that one such desire is to push data protection upstream, to the point of ingest or where data is created, and before it is stored. As a leading provider of batch and streaming data ingestion technology, it was a natural fit for StreamSets to add data protection capabilities to our platform. StreamSets Data Protector, a new product we have launched today, does exactly that.

StreamSets Data Protector helps organizations address a number of growing challenges:

Compliance with varying and conflicting regulatory requirements

While GDPR is arguably today’s “It” regulation, many others preceded it and no doubt more will follow. As requirements grow in number, inevitably organizations will be faced with having to meet the needs of conflicting mandates. The GDPR right to be forgotten article, for example, is in direct conflict with data retention regulations, such as those tied to anti-money laundering (AML) laws or the requirement to retain data (not destroy evidence) when a company is sued.  

To get around this, teams have turned to custom solutions or have restricted data access to a handpicked set of people to avoid harmful consequences. In so doing, they lose valuable insights that could help the business compete effectively.  To overcome this tension, Data Protector enables teams to define security zones, so multiple teams can interact with data in different ways, eliminating the need to lock all data up entirely. At the same time, Data Protector builds on StreamSets' extensibility capabilities so teams can simply add custom built detection and protection algorithms that may have been created to uniquely solve a requirement for a specific regulation.

Reducing business risk, but not at the expense of valuable insight

In most businesses, there’s a conflicting set of forces when it comes to dealing with data protection. On one hand, security and risk teams are keen to see everything locked down, implementing robust measures to ensure sensitive business data does not get into the wrong hands. On the other, analysts and data scientists worry that if key business data becomes obfuscated or restricted, it prevents them from delivering the valuable insights the business expects from them.

By pushing the detection and protection of sensitive data upstream, StreamSets Data Protector helps you meet both needs. Teams can implement multiple data protection policies against the same incoming data using a wide range of operators, including reversible and irreversible algorithms, pseudonymization, masking, generalization and much more. Different levels of obfuscation can be combined with routing rules to give different users different levels of visibility.

Using StreamSets Data Protector to detect and obfuscate sensitive data at development time in StreamSets Control Hub.

Robust reporting for when the regulation czar comes knocking

While StreamSets Data Protector itself is not a compliance tool, it can certainly help you meet compliance mandates. Data Protector collects detailed metadata that informs comprehensive audit reports, showing where data has come from, who has interacted with it, how it has been changed, and where it has been stored. These reports make it easier to ensure that, regardless of the requirements being put forth, data and security teams can collaborate to show they are taking the right steps to meet them. 

StreamSets Data Protector is an exciting new addition to the StreamSets DataOps Platform. You can learn more about Data Protector by visiting our website. We’re also holding a webinar to discuss data protection and walk through the new functionality on March 29th. Finally, if you’re at the Strata Data Conference in San Jose this week, or Big Data Paris next week, be sure to stop by our booth for a demo of this exciting new functionality!

ClarkeIntroducing StreamSets Data Protector