Skip to content

StreamSets Data Integration Blog

Where change is welcome.

Triggering Databricks Notebook Jobs from StreamSets Data Collector

By June 21, 2017

S3 and DatabricksLast December, I covered Continuous Data Integration with StreamSets Data Collector and Spark Streaming on Databricks. In StreamSets Data Collector (SDC) version 2.5.0.0 we added the Spark Executor, allowing your pipelines to trigger a Spark application, running on Apache YARN or Databricks. I’m going to cover the latter in this blog post, showing you how to trigger a notebook job on Databricks from events in a pipeline, generating analyses and visualizations on demand.

Introducing the Data Collector Support Bundle

By June 13, 2017

Hi, my name is Wagner Camarao and I’m a Software Engineer at StreamSets focusing on the user-facing aspects of our products. Today I’m going to talk about a new feature in the StreamSets Data Collector to optimize the interactions with our support team.

In version 2.6.0.0 of Data Collector, we’ve added a feature called Support Bundle. It allows you to generate an archive file with the most common information required to troubleshoot various issues with Data Collector, such as precise build information, configuration, thread dump, pipeline definitions and history files, and most recent log files.

Announcing Data Collector ver 2.6.0.0

By June 12, 2017

We are excited to announce version 2.6 of StreamSets Data Collector. This release has important functionality focused on helping customers to modernize their enterprise data warehouses on Hadoop, CyberSecurity, IoT and Spark.

This release has 6 new features, 20 improvements and 72 bug fixes. For a full list, see What’s New. For a list of bug fixes and known issues, see the Release Notes.

Back To Top