Dataflow Performance Blog

Contributing to the StreamSets Data Collector Community

StreamSets MeetupAs you likely already know, StreamSets Data Collector (SDC) is open source, made available via the Apache 2.0 license. The entire source code for the product is hosted in a GitHub project and the binaries are always available for download.

As well as being part of our engineering culture, open source gives us a number of business advantages. Prospective users can freely download, install, evaluate and even put SDC into production, customers have access to the source code without a costly escrow process and, perhaps most importantly, our users can contribute fixes and enhancements to improve the product for the benefit of the whole community. In this post, I'd like to acknowledge some of those contributions, and invite you to contribute, too.

Since SDC was released, back in September 2015, we've received a wide variety of code contributions from our community. Some of these have been small: Jurjen Vorhauer, a consultant at JDriven in the Netherlands, contributed a single line of code that fixed an annoying bug in the Cassandra target. Other contributions have improved the general quality of the product: Sudhanshu Bahety, a student at UC San Diego, cleaned up a whole series of exception messages. Alexander Ulyanov, CTO of BeKitzur Consulting & Development in Saint Petersburg, Russia, contributed a complete pipeline stage – the Redis Consumer that's now part of the product. The most recent major contribution, a MySQL binary log ‘change data capture' origin from the developers at, is over 4,800 lines of code and adds significant new functionality to SDC.

When developers contribute code back to the project, everyone benefits. The community has access to new features and fixes, while contributors see their code reviewed, incorporated into SDC, and extended by the product team and other developers. Of course, code is not the only contribution. Many community members have reported issues in SDC, whether bugs or feature requests. We've seen some great blog posts, articles, and meetup sessions. You certainly don't need to be a developer to make your mark!

So, I hear you asking yourself, how do I get in on this community awesomeness? If you find a bug, or wish SDC had a particular feature, check out our issues list to see if it's there already. If it is, then vote for it so we can prioritize it accordingly, and/or watch it so you get notified of progress. Does one of your pipelines demonstrate an innovative technique? Contribute it to the SDC tutorials project! Engage with your local big data community (search big data on and present a session on your experiences with SDC. If you're a developer, and there's an itch you want to scratch, fork the GitHub project and get coding. In common with many open source projects, we need you to sign our contributor license agreement before we can incorporate your code, so get that done and you can file a pull request when you're ready.

Even after over 25 years as a developer, I'm still thrilled to see my code running in production; as community champion for StreamSets, one of my greatest pleasures is enabling developers and users around the world to share that excitement as their contributions are accepted and recognized. Step up and make your mark on StreamSets Data Collector!

Pat PattersonContributing to the StreamSets Data Collector Community