Dataflow Performance Blog

Introducing the Data Collector Support Bundle

Hi, my name is Wagner Camarao and I'm a Software Engineer at StreamSets focusing on the user-facing aspects of our products. Today I'm going to talk about a new feature in the StreamSets Data Collector to optimize the interactions with our support team.

In version 2.6.0.0 of Data Collector, we’ve added a feature called Support Bundle. It allows you to generate an archive file with the most common information required to troubleshoot various issues with Data Collector, such as precise build information, configuration, thread dump, pipeline definitions and history files, and most recent log files.

This feature was conceived by the Three J’s team (Jeff Evans, Ji Sun and Jarcec Cecho) in an internal, one-day hackathon at StreamSets, and now it’s being added to the product after proper refactoring for production use.

The Support Bundle can be found under the help menu:

Which opens the following modal window:

Once generators have been selected, you can download the bundle file for review, or upload it directly to StreamSets so our support team can help you out. For each one of the selected generators, there will be one directory at the top level of the bundle:

SDC Info

  • Contains runtime information about Data Collector, divided into subdirectories:
    • Configuration such as security policy, LDAP, and property files for Data Collector.
    • Directory listing of important files (e.g. jars) to make sure they are placed at the right location and have the permissions as expected by SDC.
    • Shell scripts (e.g. initd) related to the SDC environment.
    • Properties describing the Operating System and Java installation details, and build information such as timestamp and commit hash.
    • Runtime information such as JVM metrics and thread dump.

Pipelines

  • Contains an export of all pipelines registered in Data Collector with a few metadata files such as execution history. All sensitive configuration fields are redacted before including the pipeline metadata in a bundle.

Snapshots

  • Includes all snapshots for all pipelines in Data Collector. Since snapshots contain user data by definition, they’re not included in the bundle unless explicitly requested by checking the box for this generator.

Logs

  • Contains the most recent 1 GB of logs in chronological order, e.g. it doesn’t matter whether there is one large file or multiple smaller files, the bundle will always contain the most recent 1 GB of logs. This can be configured by adding the bundle.log.max_size property to the Data Collector configuration file sdc.properties.

🔒 Redacting Sensitive Information

By default, the Support Bundle redacts passwords to ensure that information is not inadvertently shared. Below’s an example of redacted passwords from the Data Collector configuration file sdc.properties:

You can also create custom redaction rules by modifying the support bundle redactor file. Please refer to the documentation section Customizing Generators for more information.

This is version one of the Support Bundle and we hope to optimize the troubleshooting process with it, by reducing the number of interactions between information retrievals and investigations. We’re looking forward to hearing your feedback on how this is helpful to you, and what could be missing or improved as next steps.

Wagner CamaraoIntroducing the Data Collector Support Bundle