Dataflow Performance Blog

Create a Custom Expression Language Function for StreamSets Data Collector

Custom EL SnapshotOne of the most powerful features in StreamSets Data Collector (SDC) is support for Expression Language, or ‘EL' for short. EL was introduced in JavaServer Pages (JSP) 2.0 as a mechanism for accessing Java code from JSP. The Expression Evaluator and Stream Selector stages rely heavily on EL, but you can use EL in configuring almost every SDC stage. In this blog entry I'll explain a little about EL and show you how to write your own EL functions.

EL Basics

As its name implies, EL allows you to do more than just access Java code – you can write expressions such as

This will evaluate to true if the /id field's value is more than 10 characters long, otherwise it will be false.

SDC includes a wide variety of EL functions for purposes such as accessing a record's fields and attributes, detecting drift in record structure, and performing standard math and string operations. You can get a long way with these ‘off-the-shelf' functions and, when you want to go further, it's really straightforward to create custom EL functions.

Custom EL Functions

Let's say you're processing web server log data and you want to filter out any requests from clients in the local ‘private' network address ranges; for example, the – range. Java helpfully provides an isSiteLocalAddress() method on the InetAddress class, and Google's Guava library allows us to create an InetAddress object from a string containing an IP address without hitting the network, so we can easily sketch out a class with an isPrivate() method:

What if we pass something that isn't an IP address at all? InetAddresses.forString() will throw an exception, which we don't want to happen while we're running our pipeline, so let's catch that, and any other possible exceptions such as null pointers:

Note – when you use EL in a processor stage, SDC evaluates it using a dummy record as part of pipeline validation, so you should take care that your EL functions don't throw an exception on empty or unexpected input.

To make this available as an EL function we just need to add some annotations:

And that's it! We can use Maven to build this into a JAR using a very standard pom.xml file, copy the JAR to $SDC_DIST/libs-common-lib, and use it in a pipeline:

Custom EL Pipeline

Let's preview the pipeline on some test data:

Custom EL Preview

Success! Note that this is a deliberately simple, but still useful, example. EL functions can accept any number of arguments and be arbitrarily complex. What functionality are you going to build into a custom EL function? Let us know in the comments!

Pat PattersonCreate a Custom Expression Language Function for StreamSets Data Collector