StreamSets is excited to announce the immediate availability of StreamSets Data Collector 3.11.0 and StreamSets Data Collector Edge 3.11.0.
StreamSets Data Collector is open source under Apache License 2.0 and a powerful design and execution engine. It enables moving data between any source and destination, performing transformations, and push down analytics along the way. To download, click here.
StreamSets Data Collector Edge is a lightweight execution agent that runs on edge devices with limited memory, CPU, and/or connectivity resources. It enables reading data from an edge device or receiveing data from another dataflow pipeline. It supports messaging protocols including HTTP, MQTT, CoAP, and WebSockets. To download, click here.
There are some great new features and enhancements included in this release—let’s review some of the highlights. For a complete list of enhancements, new features, bug fixes, and upgrade instructions, please refer to the Release Notes.
StreamSets Data Collector 3.11.0
- Amazon S3 now generates event records when it starts processing a new object and when it finishes processing an object.
- Azure Data Lake Storage Gen1 and Azure Data Lake Storage Gen2 are no longer considered a Technology Preview feature and are approved for use in production.
- Google Big Query, Google Cloud Storage and Google Pub/Sub Subscriber now support JSON service-account credentials pasted directly into the UI.
- Kafka Consumer and Kafka Multitopic Consumer can now be configured to save the Kafka message key in the record. The origin can save the key in a record header attribute, a record field, or both.
- HTTP Client now supports time functions in the Resource URL property.
- Salesforce origin has a new Mismatched Types Behavior property, which specifies how to handle fields with types that do not match the schema.
- HTTP Client can now return the first matching value, all matching values in a list in a single record, or all matching values in separate records.
- ADLS Gen1 File Metadata and ADLS Gen2 File Metadata are no longer considered a Technology Preview feature and are approved for use in production.
- JDBC Query can now generate events that you can use in an event stream. You can configure the executor to include the number of rows returned or affected by the query when generating events.
Technology Preview Functionality
The following Technology Preview stages are newly available in this release:
- Cron Scheduler origin generates a record with the current datetime as scheduled by a cron expression.
- Start Pipeline origin starts a Data Collector, Data Collector Edge, or Transformer pipeline.
- Control Hub API processor calls a Control Hub API.
- Start Job processor starts a Control Hub job.
- Start Pipeline processor starts a Data Collector, Data Collector Edge, or Transformer pipeline.
Data Collector Configuration
This release includes the following Data Collector configuration enhancement.
The Data Collector configuration file sdc.properties contains a new stage-specific property, stage.conf_com.streamsets.pipeline.stage.hive.impersonate.current.user. Setting this property to true will enable the Hive Metadata processor, the Hive Metastore destination, and the Hive Query executor to impersonate the current user when connecting to Hive.
Feedback and Contributions
If you’d like to suggest a feature, enhancement, or if you see something that needs to be fixed or made better, feel free to open a ticket by visiting—https://issues.streamsets.com.
Also note that StreamSets welcomes contributions from the community. For guidelines on contributing code, visit—https://github.com/streamsets/datacollector/blob/master/CONTRIBUTING.md
For more information about StreamSets Data Collector, visit our documentation. For more information about StreamSets Data Collector Edge, visit our documentation.
For any other questions and inquiries, please contact us.