Last October, we publicly announced StreamSets Data Collector version 1.0. Over the last 12 months we have seen an awesome (a word we don’t use lightly) amount of adoption of our first product – from individual developers simplifying their day-to-day work, to small startups building the next big thing, to the very largest companies building global scale enterprise architectures with StreamSets Data Collector at its core.
Drawing from the experience of our co-founders over the last few decades, and numerous interviews we’ve had with companies over the last year, we are excited to launch the next version of the Data Collector that ties deeply with our newest product StreamSets Dataflow Performance Manager (DPM).
The days of writing individual point-to-point pipelines are behind us – true value lies in a high-level view of how pipelines work together to deliver data to enable the larger application. And when you see a multitude of pipelines through a single pane of glass, you want to see delivery metrics at that aggregate level and you want to know if and when data delivery is not optimal, and get alerted when you need to take action.
If you are a developer, DPM lets you perform release and configuration management of your pipelines, share pipelines within your team, execute pipelines on production systems – and finally, see a multitude of pipelines (yours and those created by other members of your team) come together in a topology.
If you are an architect or Chief Data Officer, DPM lets you monitor data flows for the complete application. If you are responsible for the building and upkeep of all data flowing into the larger Customer 360 application within the enterprise and you have different groups building discreet pipelines to feed different pieces of data, you can use the DPM pull all these pipelines together into a central canvas and visualize the complete data flow. DPM also lets you drive standardization across your enterprise and lets you think about metrics and Service Level Agreements for all data in motion.
Version 2.0 has a host of new features:
– Integration with StreamSets DPM Cloud.
– Support for Oracle CDC. If you’d like to get real-time data from an Oracle database, use the Oracle CDC Client origin to get started.
– Support for MapR version 5.2.0.
– Support for cluster mode streaming using MapR Streams.
– Field Flattener processor that flattens nested records.
– Enhancements to the GeoIP lookup processor to perform lookups from multiple databases.
– Updates to the FTP/SFTP Client origin to allow transferring whole binary files.
– Also 50+ bug fixes
Check out the release notes for the complete list of new features and updates in SDC 2.0.
Download it now, and let us know what you think.
Here are some frequently asked questions about SDC and DPM:
Q. Will upgrading to StreamSets Data Collector 2.0 break my older pipelines?
A. No, it will automatically function with pipelines developed in older versions.
Q. Do I have to use DPM to continue using StreamSets Data Collector?
A. No, you can continue to use it as is.
Q. Is the DPM software also open source?
A. No, DPM is proprietary software that runs on the StreamSets Cloud or can be deployed to your private cloud.
Q. Will StreamSets Data Collector continue to be open source?
A. In short, it’s business as usual. It will continue to remain 100% open source. Since DPM relies on the ingest abilities in StreamSets Data Collector, we will continue to aggressively build new features and integrations.