skip to Main Content

Creating a Custom Origin for StreamSets Data Collector

By Posted in Data Integration December 12, 2016

Git Commit Log RecordSince writing tutorials for creating custom destinations and processors for StreamSets Data Collector (SDC), I’ve been looking for a good use case for a custom origin tutorial. It’s been trickier than I expected, partly because the list of out of the box origins is so extensive, and partly because the HTTP Client origin can access most web service APIs, rendering a custom origin redundant. Then, last week, StreamSets software engineer Jeff Evans suggested Git. Creating a custom origin for StreamSets to read the Git commit log turned into the perfect tutorial.

“Why?” I hear you ask. Well, there are many reasons:

  • Git is familiar to most developers
  • The Git commit log is an ordered sequence of entries, each with a unique identifier – the commit hash
  • JGit offers an easy way to read the commit log, either in its entirety, or across a range of entries
  • It’s easy to create a repository, and add commits, to test the origin
  • Git is free – and who doesn’t love free?

If you’ve been wondering how to get started writing a custom origin, then wonder no more, head on over to the article, Creating a Custom StreamSets Origin, and get started, today!

Conduct Data Ingestion and Transformations In One Place

Deploy across hybrid and multi-cloud
Schedule a Demo
Back To Top