Skip to content

StreamSets Data Integration Blog

Where change is welcome.

Create a Custom Expression Language Function for StreamSets Data Collector

By April 28, 2017

Custom Expression Language SnapshotOne of the most powerful features in StreamSets Data Collector Engine is support for Expression Language, or ‘EL’ for short. EL was introduced in JavaServer Pages (JSP) 2.0 as a mechanism for accessing Java code from JSP. The Expression Evaluator and Stream Selector stages rely heavily on EL, but you can use Expression Language in configuring almost every SDC stage. In this blog entry I’ll explain a little about EL and show you how to write your own EL functions.

Creating a Custom Multithreaded Origin for StreamSets Data Collector

By April 25, 2017

Multithreaded PipelineMultithreaded Pipelines, introduced a couple of releases back, in StreamSets Data Collector (SDC) 2.3.0.0, enable a single pipeline instance to process high volumes of data, taking full advantage of all available CPUs on the machine. In this blog entry I’ll explain a little about how multithreaded pipelines work, and how you can implement your own multithreaded pipeline origin thanks to a new tutorial by Guglielmo Iozzia, Big Data Analytics Manager at Optum, part of UnitedHealth Group.

Making Sense of Stream Processing

By April 19, 2017

StreamThere has been an explosion of innovation in open source stream processing over the past few years. Frameworks such as Apache Spark and Apache Storm give developers stream abstractions on which they can develop applications; Apache Beam provides an API abstraction, enabling developers to write code independent of the underlying framework, while tools such as Apache NiFi and StreamSets Data Collector provide a user interface abstraction, allowing data engineers to define data flows from high-level building blocks with little or no coding.

In this article, I’ll propose a framework for organizing stream processing projects, and briefly describe each area. I’ll be focusing on organizing the projects into a conceptual model; there are many articles that compare the streaming frameworks for real-world applications – I list a few at the end.

Transform Data in StreamSets DataOps Platform

By April 11, 2017

Do you need to transform data? Good new is that I’ve written quite a bit over the past few months about the more advanced aspects of data manipulation in StreamSets Data Collector Engine (SDC) – writing custom processors, calling Java libraries from JavaScript, Groovy & Python, and even using Java and Scala with the Spark Evaluator. As a developer, it’s always great fun to break out the editor and get to work, but we should be careful not to jump the gun. Just because you can solve a problem with code, doesn’t mean you should. Using SDC’s built-in processor stages is not only easier than writing code, it typically results in better performance. In this blog entry, I’ll look at some of these stages, and the problems you can solve with them.

Back To Top