skip to Main Content

The DataOps Blog

Where Change Is Welcome

Alexa, Start My Data Pipeline

By April 29, 2021

Imagine asking Amazon Alexa or Google Home to run your ETL, data processing, and machine learning data pipelines. For example, "Start my data pipeline on Amazon EMR", “How many active jobs do I have running on Databricks?", or "Stop my…

13 Data Engineering Best Practices At DNB

By November 17, 2020

DNB is Norway's largest financial services group, and has a reputation as a trusted financial institution throughout the region. In this guest post, the DNB Data Engineering Centre of Practice team--Saleem Pothiwala, Operations Lead - Customer Insights, Jones Mabea Agwata,…

Demystifying Kerberos Authentication on Hadoop Clusters

By September 29, 2020

Guest post by Rishi Jain, Technical Support Engineer III, StreamSets. In this blog post, you'll learn the recommended way of enabling and using kerberos authentication when running StreamSets Transformer, a modern transformation engine, on Hadoop clusters. Generally speaking, the --proxy-user argument…

What are Grok Patterns?

By August 10, 2020

Grok leverages regular expression language that allows you to name existing patterns and/or combine them into more complex patterns. Because Grok is based on regular expressions, any valid regular expressions (regexp) are also valid in grok. In StreamSets Data Collector,…

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy