Start Job

Supported pipeline types:
  • Data Collector

Upon receipt of a record from its parent pipeline, the Start Job processor starts one or more Control Hub jobs or job instances from a job template in parallel.

The Start Job processor is an orchestrator stage. The pipeline that contains the stage is an orchestration pipeline. Orchestrator stages schedule and arrange tasks that complete workflows through the orchestration pipeline. For more information, see Orchestration Pipeline Overview. For example, an orchestration pipeline can use the Cron Scheduler origin to generate a record every weekday at 9 AM and trigger the Start Job processor, which starts a Control Hub job that you run during business hours.

The Start Job processor takes the received record and adds a list of the started jobs and a field that indicates whether the jobs finished successfully. Subsequent stages in the orchestration pipeline can use this field to determine the next task. For example, a Stream Selector processor might use the finished status to determine the tasks to complete next.

When you configure the Start Job processor, you specify the URL of Control Hub that runs the jobs or job template. Then, you specify the IDs of the jobs or job template to start. If you specify a job template, you also specify runtime parameters for each job instance that you want the processor to start from that job template.

You can configure the processor to restart the origins in the jobs when possible. You can also configure the processor to start the jobs in the background. After starting jobs in the background, the processor immediately sends the passed-in record to the next stage rather than waiting for the jobs to completely finish.

You also configure the user name and password to run the job and can optionally configure SSL/TLS properties.

Data Flow

The data flow of the orchestration pipeline that contains the Start Job processor depends on whether the processor runs the started jobs in the background.

When processing a record, the Start Job processor starts the specified jobs. To the processed record, the Start Job processor adds a list map of the started jobs and a field that indicates whether those jobs finished successfully. A started job finished successfully once its state becomes INACTIVE.

The processor updates and passes the received record to the next stage in the orchestration pipeline either immediately after the jobs start or after the jobs finish, depending on whether the jobs run in the background. Choose whether to run the jobs in the background based on the data flow needed in the orchestration pipeline.

Jobs Run in Background

When running the started jobs in the background, the processor updates and passes the processed record to the next stage in the orchestration pipeline immediately after starting the jobs. The processed record contains the list map of started jobs and the finished status of the jobs. The processor and orchestration pipeline do not track whether the started jobs finish successfully. Therefore, the finished status always indicates unsuccessful for jobs run in the background.

In this case, you can run other stages in parallel and complete tasks simultaneously. Because the processor does not generate another record when the started jobs finish, other stages in the orchestration pipeline cannot depend on the completion of or values from the started jobs.

Jobs Not Run in Background

When not running the started jobs in the background, the processor updates and passes the processed record to the next stage in the orchestration pipeline after all the started jobs stop running. The processed record contains the list map of started jobs and the finished status of the jobs.

In this case, a subsequent stage in the orchestration pipeline can depend on the completion of one of the started jobs or a value from one of the started jobs. For example, a Stream Selector processor might use the finished status to determine the tasks to complete next.

Suffix for Job Instance Names

For job instances created or started from a job template, Control Hub appends a suffix to uniquely name each job instance.

The suffix is added to the job template name after a hyphen, as follows:

<job template name> - <suffix>
Select one of the following methods to generate the suffix:
Counter
Control Hub appends a number to the job template name. For example, job instances created from the Web Log Collection Job are named as follows:
  • Web Log Collection Job - 1
  • Web Log Collection Job - 2
Timestamp
Control Hub appends a timestamp indicating when the job instance is started to the job template name. For example, job instances created from the Web Log Collection Job are named as follows:
  • Web Log Collection Job - 2018-10-22
  • Web Log Collection Job - 2018-10-23
Use a timestamp for the suffix when you plan to create and start job instances from the template at different times. For example, if you plan to start a single job instance every 24 hours. If you start multiple job instances from the template at the same time, each job instance name is appended with the same timestamp.
Parameter Value
Control Hub appends the value of the specified parameter. For example, job instances created from the Web Log Collection Job are named as follows:
  • Web Log Collection Job - /server1/logs
  • Web Log Collection Job - /server2/logs

Runtime Parameters for Job Templates

When you configure the Start Job processor to start job instances from templates, you must specify the runtime parameters for each job instance that you want the processor to start. You can use functions from the StreamSets expression language to define parameter values.

For each job instance, enter the runtime parameters as a JSON object, specifying the parameter names and values as key-value pairs. The parameter names must match runtime parameters defined for the pipeline that the job runs. The processor starts a job instance for each object you define.

For example, to configure the processor to start two job instances with different parameter values, you might enter:
[
   {
      "FileDir": "/server1/logs",
      "ErrorDir": "/server1/errors"
   }
   {
      "FileDir": "/server2/logs",
      "ErrorDir": "/server2/errors"
   }
]

Configuring a Start Job Processor

Configure a Start Job processor to start a Control Hub job. The Start Job processor is an orchestrator stage.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Jobs tab, configure the following properties:
    Jobs Property Description
    Control Hub Base URL URL of Control Hub that runs the jobs:
    • For Control Hub cloud, enter https://cloud.streamsets.com.
    • For Control Hub on-premises, enter the URL provided by your system administrator. For example, https://<hostname>:18631.
    Job Template Starts one or more job instances from a defined job template.
    Job Template ID ID of a job template that the processor starts. Available when starting jobs from a job template.
    Instance Name Suffix Method for generating a suffix to uniquely name each job instance:
    • Counter
    • Timestamp
    • Parameter Value
    Available when starting jobs from a job template.
    Runtime Parameters for Each Instance Runtime parameters and values, specified as a JSON object for each job instance. The processor starts a job instance in parallel for each defined JSON object.

    Available when starting jobs from a job template.

    Jobs List of jobs started in parallel. Available when not starting a job instance from a template.
    For each job enter:
    • Job ID - ID of the job.

      To find the job ID in Control Hub, expand the job in the Jobs view and click Show Additional Info.

    • Runtime Parameters - Parameters passed to the job. The job passes the runtime parameters to the pipeline at runtime.

    To include another job, click the Add icon.

    You can use simple or bulk edit mode to specify jobs.

    Reset Origin Resets the origin before starting a job, if the origin can be reset. For a list of origins that can be reset, see Resetting the Origin.
    Run in Background Runs the started jobs in the background.

    When running started jobs in the background, the processor passes the record to the next stage immediately after starting the pipelines.

    When not running started jobs in the background, the processor holds the record until all started jobs completely finish, and then passes the record to the next stage.

    Delay Between State Checks Milliseconds to wait between checks for the completion status of the started job. Available when not running started jobs in the background.
  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Control Hub User Name User that runs the jobs. Enter a Control Hub user name in the following format:
    <ID>@<organization ID>
    Password Password for the user.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
  4. To use SSL/TLS, click the TLS tab and configure the following properties.
    TLS Property Description
    Use TLS Enables the use of TLS.
    Keystore File Path to the keystore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

    For more information about environment variables, see Data Collector Environment Configuration.

    By default, no keystore is used.

    Keystore Type Type of keystore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Keystore Password Password to the keystore file. A password is optional, but recommended.
    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
    Keystore Key Algorithm Algorithm to manage the keystore.

    Default is SunX509.

    Truststore File Path to the truststore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES.

    For more information about environment variables, see Data Collector Environment Configuration.

    By default, no truststore is used.

    Truststore Type Type of truststore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Truststore Password Password to the truststore file. A password is optional, but recommended.
    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
    Truststore Trust Algorithm Algorithm to manage the truststore.

    Default is SunX509.

    Use Default Protocols Uses the default TLSv1.2 transport layer security (TLS) protocol. To use a different protocol, clear this option.
    Transport Protocols TLS protocols to use. To use a protocol other than the default TLSv1.2, click the Add icon and enter the protocol name. You can use simple or bulk edit mode to add protocols.
    Note: Older protocols are not as secure as TLSv1.2.
    Use Default Cipher Suites Uses a default cipher suite for the SSL/TLS handshake. To use a different cipher suite, clear this option.
    Cipher Suites Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite. You can use simple or bulk edit mode to add cipher suites.

    Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.