skip to Main Content

The DataOps Blog

Where Change Is Welcome

Creating a Custom Origin for StreamSets Data Collector

By Posted in Engineering December 12, 2016

Git Commit Log RecordSince writing tutorials for creating custom destinations and processors for StreamSets Data Collector (SDC), I’ve been looking for a good use case for a custom origin tutorial. It’s been trickier than I expected, partly because the list of out of the box origins is so extensive, and partly because the HTTP Client origin can access most web service APIs, rendering a custom origin redundant. Then, last week, StreamSets software engineer Jeff Evans suggested Git. Creating a custom origin to read the Git commit log turned into the perfect tutorial.

“Why?” I hear you ask. Well, there are many reasons:

  • Git is familiar to most developers
  • The Git commit log is an ordered sequence of entries, each with a unique identifier – the commit hash
  • JGit offers an easy way to read the commit log, either in its entirety, or across a range of entries
  • It’s easy to create a repository, and add commits, to test the origin
  • Git is free – and who doesn’t love free?

If you’ve been wondering how to get started writing a custom origin, then wonder no more, head on over to the article, Creating a Custom StreamSets Origin, and get started, today!

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy