Dataflow Performance Blog

Creating a Custom Origin for StreamSets Data Collector

Git Commit Log RecordSince writing tutorials for creating custom destinations and processors for StreamSets Data Collector (SDC), I've been looking for a good use case for a custom origin tutorial. It's been trickier than I expected, partly because the list of out of the box origins is so extensive, and partly because the HTTP Client origin can access most web service APIs, rendering a custom origin redundant. Then, last week, StreamSets software engineer Jeff Evans suggested Git. Creating a custom origin to read the Git commit log turned into the perfect tutorial.

“Why?” I hear you ask. Well, there are many reasons:

  • Git is familiar to most developers
  • The Git commit log is an ordered sequence of entries, each with a unique identifier – the commit hash
  • JGit offers an easy way to read the commit log, either in its entirety, or across a range of entries
  • It's easy to create a repository, and add commits, to test the origin
  • Git is free – and who doesn't love free?

If you've been wondering how to get started writing a custom origin, then wonder no more, head on over to the article, Creating a Custom StreamSets Origin, and get started, today!

Pat PattersonCreating a Custom Origin for StreamSets Data Collector