StreamSets News

Straight from Our Customers: The Benefits of Modern Ingestion

Three months into my journey here at StreamSets and I’ve had a chance to talk with many of our customers and prospects to understand how they are using the open source StreamSets Data Collector (SDC) across a number of different use cases. As it turns out, behind solving technical problems in areas such as cybersecurity, IoT or plain old data lake ingestion lies a treasure trove of value that IT teams realize as part of a typical deployment.
While this is not an exhaustive list, let’s take a quick look at some of the more common benefits our customers call out.

ClarkeStraight from Our Customers: The Benefits of Modern Ingestion
Read More

Ask StreamSets: Questions and Answers for the StreamSets Community

Ask StreamSetsIt's fair to say that most developers are familiar with Stack Overflow and the Stack Exchange network of question and answer sites. Q&A sites such as Stack Overflow serve communities of users focused around a particular topic or discipline – in the case of Stack Overflow, programming. Today, we're launching Ask StreamSets, a Q&A site for the StreamSets community.

Pat PattersonAsk StreamSets: Questions and Answers for the StreamSets Community
Read More

Getting Started with StreamSets Data Collector on Docker

Docker logoSimplicity is the ultimate sophistication.
– Leonardo da Vinci

As a recent hire on the Engineering Productivity team here at StreamSets, my early days at the company were marked by efforts to dive head-first into StreamSets Data Collector (SDC). As it turns out, the Docker images we publish for SDC were the easiest way to explore its vast set of features and capabilities, which is exactly why I am writing this blog post.

Without further ado, let’s get started.

Kirti VelankarGetting Started with StreamSets Data Collector on Docker
Read More

Announcing Data Collector v2.7.0.0

This release has been superseded by version 2.7.1.0. Please upgrade to v2.7.1.0 at the earliest.

We are happy to release version 2.7.0.0 of StreamSets Data Collector.

You can download the latest open source release here.

This release has 134 new features and improvements and over160 bug fixes. For a full list, see What's New. For a list of bug fixes and known issues, see the Release Notes.

Kirit BasuAnnouncing Data Collector v2.7.0.0
Read More

Triggering Databricks Notebook Jobs from StreamSets Data Collector

S3 and DatabricksLast December, I covered Continuous Data Integration with StreamSets Data Collector and Spark Streaming on Databricks. In StreamSets Data Collector (SDC) version 2.5.0.0 we added the Spark Executor, allowing your pipelines to trigger a Spark application, running on Apache YARN or Databricks. I'm going to cover the latter in this blog post, showing you how to trigger a notebook job on Databricks from events in a pipeline, generating analyses and visualizations on demand.

Pat PattersonTriggering Databricks Notebook Jobs from StreamSets Data Collector
Read More

Introducing the Data Collector Support Bundle

Hi, my name is Wagner Camarao and I'm a Software Engineer at StreamSets focusing on the user-facing aspects of our products. Today I'm going to talk about a new feature in the StreamSets Data Collector to optimize the interactions with our support team.

In version 2.6.0.0 of Data Collector, we’ve added a feature called Support Bundle. It allows you to generate an archive file with the most common information required to troubleshoot various issues with Data Collector, such as precise build information, configuration, thread dump, pipeline definitions and history files, and most recent log files.

Wagner CamaraoIntroducing the Data Collector Support Bundle
Read More

Announcing Data Collector ver 2.6.0.0

We are excited to announce version 2.6 of StreamSets Data Collector. This release has important functionality focused on helping customers to modernize their enterprise data warehouses on Hadoop, CyberSecurity, IoT and Spark.

You can download the latest open source release here.

This release has 6 new features, 20 improvements and 72 bug fixes. For a full list, see What's New. For a list of bug fixes and known issues, see the Release Notes.

Kirit BasuAnnouncing Data Collector ver 2.6.0.0
Read More

Create a Custom Expression Language Function for StreamSets Data Collector

Custom EL SnapshotOne of the most powerful features in StreamSets Data Collector (SDC) is support for Expression Language, or ‘EL' for short. EL was introduced in JavaServer Pages (JSP) 2.0 as a mechanism for accessing Java code from JSP. The Expression Evaluator and Stream Selector stages rely heavily on EL, but you can use EL in configuring almost every SDC stage. In this blog entry I'll explain a little about EL and show you how to write your own EL functions.

Pat PattersonCreate a Custom Expression Language Function for StreamSets Data Collector
Read More

Creating a Custom Multithreaded Origin for StreamSets Data Collector

Multithreaded PipelineMultithreaded Pipelines, introduced a couple of releases back, in StreamSets Data Collector (SDC) 2.3.0.0, enable a single pipeline instance to process high volumes of data, taking full advantage of all available CPUs on the machine. In this blog entry I'll explain a little about how multithreaded pipelines work, and how you can implement your own multithreaded pipeline origin thanks to a new tutorial by Guglielmo Iozzia, Big Data Analytics Manager at Optum, part of UnitedHealth Group.

Pat PattersonCreating a Custom Multithreaded Origin for StreamSets Data Collector
Read More