skip to Main Content

The DataOps Blog

Where Change Is Welcome

Apache Spark: Where Nothing Can Be Something

By November 19, 2020

"Nothing is more real than nothing" - Samuel Beckett This post is written by Jeff Evans, Senior Software Engineer, StreamSets. While working on an issue for a prospect’s proof of concept pipeline using StreamSets Transformer, a modern Spark ETL engine,…

13 Data Engineering Best Practices At DNB

By November 17, 2020

DNB is Norway's largest financial services group, and has a reputation as a trusted financial institution throughout the region. In this guest post, the DNB Data Engineering Centre of Practice team--Saleem Pothiwala, Operations Lead - Customer Insights, Jones Mabea Agwata,…

Demystifying Kerberos Authentication on Hadoop Clusters

By September 29, 2020

Guest post by Rishi Jain, Technical Support Engineer III, StreamSets. In this blog post, you'll learn the recommended way of enabling and using kerberos authentication when running StreamSets Transformer, a modern transformation engine, on Hadoop clusters. Generally speaking, the --proxy-user argument…

What are Grok Patterns?

By August 10, 2020

Grok leverages regular expression language that allows you to name existing patterns and/or combine them into more complex patterns. Because Grok is based on regular expressions, any valid regular expressions (regexp) are also valid in grok. In StreamSets Data Collector,…

Back To Top