skip to Main Content

The DataOps Blog

Where Change Is Welcome

StreamSets Transformer
Extensibility:
Spark and Machine Learning

Engineering

Apache Spark has been on the rise for the past few years and it continues to dominate the landscape when it comes to in-memory and distributed computing, real-time analysis and machine learning use cases. And with the recent release of StreamSets Transformer, a powerful tool for creating highly instrumented Apache Spark applications for modern ETL, you can quickly start leveraging…

By September 12, 2019

Automating Kerberos KeyTab Generation for Kubernetes-based Deployments

Engineering

A major challenge when deploying dataflow pipelines to run on Kubernetes is how to handle Kerberos principals and keytabs needed when pipelines write to secure Hadoop. One approach, of using Kerberos keytabs for principals of the form @ (without a host field), incurs security risks as a keytab for such a principal could be used on any host in the…

By August 21, 2019

Field Mapper Processor: The Swiss Army Knife of Bulk Field Manipulation

Engineering

Guest post by Jeff Evans, Senior Software Engineer, StreamSets. The Field Mapper processor, introduced in Data Collector version 3.8.0, provides a flexible and powerful way to manipulate fields en masse in your records. It operates in one of three modes: Field paths: The location within the context of the entire record and is useful for moving fields around the record,…

By May 7, 2019
Back To Top