API Reference

The StreamSets SDK for Python is broadly divided into abstractions for interacting with StreamSet Data Collector and StreamSets Control Hub.

StreamSets Data Collector

Main interface

This is the main entry point used by users when interacting with SDC instances.

class streamsets.sdk.DataCollector(server_url, username=None, password=None, authentication_method='form', accounts_authentication_token=None, accounts_server_url=None, control_hub=None, dump_log_on_error=False, **kwargs)[source]

Class to interact with StreamSets Data Collector.

If connecting to an StreamSets Control Hub-registered instance of Data Collector, create an instance of streamsets.sdk.ControlHub instead of instantiating with a username and password.

Parameters:
  • server_url (str) – URL of an existing SDC deployment with which to interact.
  • accounts_authentication_token (str, optional) – StreamSets Accounts server base URL. Default: None
  • accounts_server_url (str, optional) – StreamSets Accounts authentication token. Default: None
  • authentication_method (str, optional) – StreamSets Data Collector authentication method. Default: streamsets.sdk.constants.ENGINE_AUTHENTICATION_METHOD_FORM.
  • username (str, optional) – SDC username. Default: streamsets.sdk.sdc.DEFAULT_SDC_USERNAME.
  • password (str, optional) – SDC password. Default: streamsets.sdk.sdc.DEFAULT_SDC_PASSWORD.
  • control_hub (streamsets.sdk.ControlHub, optional) – A StreamSets Control Hub instance to use for SCH-registered Data Collectors. Default: None.
  • dump_log_on_error (bool) – Whether to output Data Collector logs when exceptions are raised by certain methods. Default: False
add_pipeline(*pipelines, **kwargs)[source]

Add one or more pipelines to the DataCollector instance.

Parameters:*pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline.
capture_snapshot(pipeline, snapshot_name=None, start_pipeline=False, runtime_parameters=None, batches=1, batch_size=10, **kwargs)[source]

Capture a snapshot for given pipeline.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • snapshot_name (str, optional) – Name for the generated snapshot. If set to None, an auto-generated UUID (which can be recovered from the returned SnapshotCommand object’s snapshot_name attribute) will be used when calling the REST API. Default: None.
  • start_pipeline (bool, optional) – If set to true, then the pipeline will be started and its first batch will be captured. Otherwise, the pipeline must be running, in which case one of the next batches will be captured. Default: False.
  • runtime_parameters (dict, optional) – Runtime parameters to override Pipeline Parameters value. Default: None.
  • wait (bool, optional) – Wait for capture snapshot to finish. Default: True.
  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED'].
  • timeout_sec (int) – Timeout to wait for snapshot, in seconds. Default: streamsets.sdk.sdc.DEFAULT_SNAPSHOT_TIMEOUT.
Returns:

An instance of streamsets.sdk.sdc_api.SnapshotCommand.

change_password(old_password, new_password)[source]

Change password for the current user.

Parameters:
  • old_password (str) – old password.
  • new_password (str) – new password.
Returns:

An instance of streamsets.sdk.sdc_api.Command.

current_user

Get currently logged-in user and its groups and roles.

Returns:An instance of streamsets.sdk.sdc_models.User.
definitions

Get an SDC instance’s definitions.

Will return a cached instance of the definitions if called more than once.

Returns:An instance of json.
export_pipeline(pipeline, include_library_definitions=False, include_plain_text_credentials=False)[source]

Export single pipeline to json file.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – Pipeline instance.
  • include_library_definitions (boolean) – Set to true to export for Control Hub.
  • include_plain_text_credentials (boolean) – Default False.
Returns:

A dict object containing the contents of pipeline.

export_pipelines(pipelines, include_library_definitions=False, include_plain_text_credentials=False)[source]

Export pipelines.

Parameters:
  • pipelines (list) – A list of streamsets.sdk.sdc_models.Pipeline instances.
  • include_library_definitions (boolean) – Set to true to export for Control Hub. Default False.
  • include_plain_text_credentials (boolean) – Default False.
Returns:

An instance of type bytes indicating the content of zip file with pipeline json files.

get_alerts()[source]

Get pipeline alerts.

Returns:An instance of streamsets.sdk.sdc_models.Alerts.
get_bundle(generators=None)[source]

Generate new support bundle.

Returns:An instance of zipfile.ZipFile.
get_bundle_generators()[source]

Get available support bundle generators.

Returns:An instance of streamsets.sdk.sdc_models.BundleGenerators.
get_logs(ending_offset=-1, extra_message=None, pipeline=None, severity=None)[source]

Get logs.

Parameters:
  • ending_offset (int) – ending_offset, Default: -1.
  • extra_message (str) – extra_message, Default: None.
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance, Default: None.
  • severity (str) – severity, Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Log.

get_pipeline_acl(pipeline)[source]

Get pipeline ACL.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.sdc_models.PipelineAcl.
get_pipeline_builder(**kwargs)[source]

Get a pipeline builder instance with which a pipeline can be created.

Returns:An instance of streamsets.sdk.sdc_models.PipelineBuilder.
get_pipeline_history(pipeline)[source]

Get a pipeline’s history.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.sdc_models.History.
get_pipeline_metrics(pipeline)[source]

Get a pipeline’s metrics.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.sdc_models.Metrics.
get_pipeline_permissions(pipeline)[source]

Return pipeline permissions for a given pipeline.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.sdc_models.PipelinePermissions.
get_pipeline_status(pipeline)[source]

Get status of a pipeline.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
get_snapshots(pipeline=None)[source]

Get information about stored snapshots.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline, optional) – The pipeline instance. Default: None.
Returns:A list of streamsets.sdk.sdc_models.SnapshotInfo instances.
get_stage_errors(pipeline, stage)[source]

Get stage errors.

Parameters:
Returns:

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sdc_models.StageError instances.

get_stage_library_version(stage)[source]

Get the stage library version.

Parameters:stage (streamsets.sdk.sdc_models.Stage) – stage object
Returns:An instance of str
id

Return id for StreamSets Data Collector.

Returns:A str SDC ID.
import_pipeline(pipeline)[source]

Import pipeline from json file.

Parameters:pipeline (dict) – JSON data loaded from file. Example usage: json.load(open(filename, ‘r’)).
Returns:An instance of streamsets.sdk.sdc_models.Pipeline.
import_pipelines_from_archive(archive)[source]

Import pipelines from archived zip directory.

Parameters:archive (file) – file containing the pipelines.
Returns:A streamsets.sdk.utils.SeekableList of streamsets.sdk.sdc_models.Pipeline.
pipelines

Get all pipelines in the pipeline store.

Returns:
A streamsets.sdk.utils.SeekableList of
streamsets.sdk.sdc_models.Pipeline instances.
remove_pipeline(*pipelines)[source]

Remove one or more pipelines from the DataCollector instance.

Parameters:*pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline.
reset_origin(pipeline)[source]

Reset origin offset.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – Pipeline object.
Returns:An instance of streamsets.sdk.sdc_api.Command.
run_dynamic_pipeline_preview(type, parameters={}, batches=1, batch_size=1, skip_targets=True, skip_lifecycle_events=True, end_stage=None, timeout=10000, test_origin=False, stage_outputs_to_override_json_text=None, stage_outputs_to_override_json=[], **kwargs)[source]

Run dynamic pipeline preview.

Parameters:
  • type (str) – Dynamic preview request type. CLASSIFICATION_CATALOG or PROTECTION_POLICY
  • parameters (dict, optional) – Dynamic preview request parameters. Default: {}
  • batches (int, optional) – Number of batches. Default: 1.
  • batch_size (int, optional) – Batch size. Default: 1.
  • skip_targets (bool, optional) – Skip targets. Default: True.
  • skip_lifecycle_events (bool, optional) – Skip life cycle events. Default: True.
  • end_stage (str, optional) – End stage. Default: None.
  • timeout (int, optional) – Timeout. Default: 10000.
  • test_origin (bool, optional) – Test origin. Default: False.
  • stage_outputs_to_override_json_text (str, optional) – Stage outputs to override text. Default: None.
  • stage_outputs_to_override_json (list, optional) – Stage outputs to override. Default: [].
  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True.
Returns:

An instance of streamsets.sdk.sdc_api.PreviewCommand.

run_pipeline_preview(pipeline, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]

Run pipeline preview.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • rev (int, optional) – Pipeline revision. Default: 0.
  • batches (int, optional) – Number of batches. Default: 1.
  • batch_size (int, optional) – Batch size. Default: 10.
  • skip_targets (bool, optional) – Skip targets. Default: True.
  • end_stage (str, optional) – End stage. Default: None.
  • timeout (int, optional) – Timeout. Default: 2000.
  • test_origin (bool, optional) – Test origin. Default: False
  • stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.
  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True.
Returns:

An instance of streamsets.sdk.sdc_api.PreviewCommand.

sample_pipelines

Get all sample pipelines in the pipeline store.

Returns:
A streamsets.sdk.utils.SeekableList of
streamsets.sdk.sdc_models.Pipeline instances.
sdc_configuration

Return all configurations for StreamSets Data Collector.

Returns:A dict with property names as keys and property values as values.
set_pipeline_acl(pipeline, pipeline_acl)[source]

Update pipeline ACL.

Parameters:
Returns:

An instance of streamsets.sdk.sdc_api.Command.

set_user(username, password=None)[source]

Set the user with which to interact with SDC.

Parameters:
  • username (str) – Username of user.
  • password (str, optional) – Password for user. Default: same as username.
stage_libraries

Get all stage libraries.

Returns:
A streamsets.sdk.utils.SeekableList of
streamsets.sdk.sdc_models.StageLibrary instances.
start_pipeline(pipeline, runtime_parameters=None, **kwargs)[source]

Start a pipeline.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • runtime_parameters (dict, optional) – Collection of runtime parameters. Default: None.
  • wait (bool, optional) – Wait for pipeline to start. Default: True.
  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED'].
  • timeout_sec (int) – Timeout to wait for pipeline statuses, in seconds. Default: streamsets.sdk.sdc.DEFAULT_START_TIMEOUT.
Returns:

An instance of streamsets.sdk.sdc_api.PipelineCommand.

stop_pipeline(pipeline, **kwargs)[source]

Stop a pipeline.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • force (bool, optional) – Force pipeline to stop. Default: False.
  • wait (bool, optional) – Wait for pipeline to stop. Default: True.
  • timeout_sec (int) – Timeout to wait for pipeline stop, in seconds. Default: streamsets.sdk.sdc.DEFAULT_STOP_TIMEOUT.
Returns:

An instance of streamsets.sdk.sdc_api.StopPipelineCommand.

update_pipeline(*pipelines)[source]

Update one or more pipelines in the DataCollector instance.

Parameters:*pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline.
validate_pipeline(pipeline)[source]

Validate a pipeline.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
version

Return the version of the Data Collector.

Returns:The version string.
Return type:str
wait_for_pipeline_metric(pipeline, metric, value, timeout_sec=30)[source]

Block until a pipeline metric reaches the desired value.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • metric (str) – The desired metric (e.g. 'output_record_count' or 'data_batch_count').
  • value – The desired value to wait for.
  • timeout_sec (int, optional) – Timeout to wait for metric to reach value, in seconds. Default: streamsets.sdk.sdc.DEFAULT_WAIT_FOR_METRIC_TIMEOUT.
Raises:

TimeoutError – If timeout_sec passes without metric reaching value.

wait_for_pipeline_status(pipeline, status, timeout_sec=30)[source]

Block until a pipeline reaches the desired status.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • status (str) – The desired status to wait for.
  • timeout_sec (int, optional) – Timeout to wait for pipeline to reach status, in seconds. Default: streamsets.sdk.sdc.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.
Raises:

TimeoutError – If timeout_sec passes without pipeline reaching status.

sdc.DEFAULT_SDC_USERNAME = 'admin'
sdc.DEFAULT_SDC_PASSWORD = 'admin'

Models

These models wrap and provide useful functionality for interacting with common SDC abstractions.

Alerts

class streamsets.sdk.sdc_models.Alert(alert)[source]

Pipeline alert.

Parameters:alert (dict) – Python object representation of a pipeline alert.
alert_texts

Get alert’s alert texts.

Returns:The alert’s alert texts as a str.
label

Get alert’s label.

Returns:The alert’s label as a str.
pipeline_id

Get alert’s pipeline ID.

Returns:The pipeline ID as a str.
class streamsets.sdk.sdc_models.Alerts(alerts)[source]

Container for list of alerts with filtering capabilities.

Parameters:alerts (dict) – Python object representation of alerts.
alerts

list – A list of streamsets.sdk.sdc_models.Alert instances.

for_pipeline(pipeline)[source]

Get alerts for the specified pipeline.

Parameters:pipeline (str) – The pipeline for which to get alerts.
Returns:An instance of streamsets.sdk.sdc_models.Alerts.

Data Rules

class streamsets.sdk.sdc_models.DataDriftRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text='${alert:info()}', send_email=False, active=False)[source]

Pipeline data drift rule.

Parameters:
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.
  • label (str) – Rule label.
  • condition (str, optional) – Data rule condition. Default: None.
  • sampling_percentage (int, optional) – Default: 5.
  • sampling_records_to_retain (int, optional) – Default: 10.
  • enable_meter (bool, optional) – Default: True.
  • enable_alert (bool, optional) – Default: True.
  • alert_text (str, optional) – Default: '${alert:info()}'.
  • send_email (bool, optional) – Default: False.
  • active (bool, optional) – Enable the data rule. Default: False.
active

The rule is active.

Returns:A bool.
class streamsets.sdk.sdc_models.DataRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text=None, threshold_type='count', threshold_value=100, min_volume=1000, send_email=False, active=False)[source]

Pipeline data rule.

Parameters:
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.
  • label (str) – Rule label.
  • condition (str, optional) – Data rule condition. Default: None.
  • sampling_percentage (int, optional) – Default: 5.
  • sampling_records_to_retain (int, optional) – Default: 10.
  • enable_meter (bool, optional) – Default: True.
  • enable_alert (bool, optional) – Default: True.
  • alert_text (str, optional) – Default: None.
  • threshold_type (str, optional) – One of count or percentage. Default: 'count'.
  • threshold_value (int, optional) – Default: 100.
  • min_volume (int, optional) – Only set if threshold_type is percentage. Default: 1000.
  • send_email (bool, optional) – Default: False.
  • active (bool, optional) – Enable the data rule. Default: False.
active

Returns if the rule is active or not.

Returns:A bool.

History

class streamsets.sdk.sdc_models.History(history)[source]

Pipeline history.

Parameters:history (dict) – Python object representation of the pipeline history.
entries

list – A list of streamsets.sdk.sdc_models.HistoryEntry instances.

latest

Get pipeline history’s latest entry.

Returns:The most recent pipeline history entry as an instance of streamsets.sdk.sdc_models.HistoryEntry.
class streamsets.sdk.sdc_models.HistoryEntry(entry)[source]

Pipeline history entry.

Parameters:entry (dict) – Python object representation of the history entry.
metrics

Get pipeline history entry’s metrics.

Returns:The pipeline history entry’s metrics as an instance of streamsets.sdk.sdc_models.Metrics.

Issues

class streamsets.sdk.sdc_models.Issue(issue)[source]

Issue encountered for a pipeline or a stage.

Parameters:issue (dict) – Python object representation of the issue.
class streamsets.sdk.sdc_models.Issues(issues)[source]

Issues encountered for pipelines as well as stages.

Parameters:issues (dict) – Python object representation of the issues.
issues_count

int – The number of issues.

pipeline_issues

list – A list of streamsets.sdk.sdc_models.Issue instances.

stage_issues

dict – A dictionary mapping stage names to instances of streamsets.sdk.sdc_models.Issue.

Logs

class streamsets.sdk.sdc_models.Log(log)[source]

Model for SDC logs.

Parameters:log (list) – A list of dictionaries (JSON representation) of the log.
after_time(timestamp)[source]

Returns log happened after the time specified.

Parameters:timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.
Returns:The formatted log as a str.
before_time(timestamp)[source]

Returns log happened before the time specified.

Parameters:timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.
Returns:The formatted log as a str.

Metrics

class streamsets.sdk.sdc_models.MetricCounter(counter)[source]

Metric counter.

Parameters:counter (dict) – Python object representation of a metric counter.
count

Get the metric counter’s count.

Returns:The metric counter’s count as an int.
class streamsets.sdk.sdc_models.MetricGauge(gauge)[source]

Metric gauge.

Parameters:gauge (dict) – Python object representation of a metric gauge.
value

Get the metric gauge’s value.

Returns:The metric gauge’s value as a str.
class streamsets.sdk.sdc_models.MetricHistogram(histogram)[source]

Metric histogram.

Parameters:histogram (dict) – Python object representation of a metric histogram.
class streamsets.sdk.sdc_models.MetricTimer(timer)[source]

Metric timer.

Parameters:timer (dict) – Python object representation of a metric timer.
count

Get the metric timer’s count.

Returns:The metric timer’s count as an int.
class streamsets.sdk.sdc_models.Metrics(metrics)[source]

Metrics.

Parameters:metrics (dict) – Python object representation of metrics.
counter(name)[source]

Get the metric counter from metrics.

Parameters:name (str) – Counter name.
Returns:The metric counter as an instance of streamsets.sdk.sdc_models.MetricCounter.
gauge(name)[source]

Get the metric gauge from metrics.

Parameters:name (str) – Gauge name.
Returns:The metric gauge as an instance of streamsets.sdk.sdc_models.MetricGauge.
histogram(name)[source]

Get the metric histogram from metrics.

Parameters:name (str) – Histogram name.
Returns:The metric histogram as an instance of streamsets.sdk.sdc_models.MetricHistogram.
pipeline

Get pipeline-level metrics.

Returns:An instance of streamsets.sdk.sdc_models.PipelineMetrics.
timer(name)[source]

Get the metric timer from metrics.

Parameters:name (str) – Timer namer.
Returns:The metric timer as an instance of streamsets.sdk.sdc_models.MetricTimer.

Pipelines

class streamsets.sdk.sdc_models.PipelineBuilder(pipeline, definitions, fragment=False, data_collector=None)[source]

Class with which to build SDC pipelines.

This class allows a user to programmatically generate an SDC pipeline. Instead of instantiating this class directly, most users should use streamsets.sdk.DataCollector.get_pipeline_builder().

Parameters:
  • pipeline (dict) – Python object representing an empty pipeline. If created manually, this would come from creating a new pipeline in SDC and then exporting it before doing any configuration.
  • definitions (dict) – The output of SDC’s definitions endpoint.
  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.
add_data_drift_rule(*data_drift_rules)[source]

Add one or more data drift rules to the pipeline.

Parameters:*data_drift_rules – One or more instances of streamsets.sdk.sdc_models.DataDriftRule.
add_data_rule(*data_rules)[source]

Add one or more data rules to the pipeline.

Parameters:*data_rules – One or more instances of streamsets.sdk.sdc_models.DataRule.
add_error_stage(label=None, name=None, library=None)[source]

Add an error stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Stage.

add_fragment(fragment, parameter_name_prefix=None)[source]

Add a fragment to the pipeline.

Parameters:
  • fragment (py:obj:streamsets.sdk.sch_models.Pipeline) – Fragment to add.
  • parameter_name_prefix (str, optional) – Prefix name for the parameters of fragment. Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Stage.

add_metric_rule(*metric_rules)[source]

Add one or more metric rules to the pipeline.

Parameters:*data_rules – One or more instances of streamsets.sdk.sdc_models.MetricRule.
add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None.
  • type (str, optional) – SDC stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Stage.

add_start_event_stage(label=None, name=None, library=None)[source]

Add start event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Stage.

add_stats_aggregator_stage(label=None, name=None, library=None)[source]

Add a stats aggregator stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Stage.

add_stop_event_stage(label=None, name=None, library=None)[source]

Add stop event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Stage.

add_test_origin_stage(label=None, name=None, library=None)[source]

Add test origin stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sdc_models.Stage.

build(title='Pipeline', **kwargs)[source]

Build the pipeline.

Parameters:title (str, optional) – Pipeline title to use. Default: 'Pipeline'.
Returns:An instance of streamsets.sdk.sdc_models.Pipeline.
import_pipeline(pipeline, **kwargs)[source]

Import a pipeline into the PipelineBuilder.

Parameters:pipeline (dict) – Exported pipeline.
Returns:An instance of streamsets.sdk.sdc_models.PipelineBuilder.
class streamsets.sdk.sdc_models.Pipeline(pipeline, all_stages=None, fragment=False)[source]

SDC pipeline.

This class provides abstractions to make it easier to interact with a pipeline before it’s imported into SDC.

Parameters:
  • pipeline (dict) – A Python object representing the serialized pipeline.
  • all_stages (dict, optional) – A dictionary mapping stage names to streamsets.sdk.sdc_models.Stage instances. Default: None.
  • fragment (boolean, optional) – Specify if a fragment. Default: False.
add_parameters(**parameters)[source]

Add pipeline parameters.

Parameters:**parameters – Keyword arguments to add.
configuration

Get pipeline’s configuration.

Returns:An instance of streamsets.sdk.models.Configuration.
delivery_guarantee

Get the delivery guarantee.

Returns:The delivery guarantee as a str.
id

Get the pipeline id.

Returns:The pipeline id as a str.
metadata

Get the pipeline metadata.

Returns:Pipeline metadata as a Python object.
origin_stage

Get the pipeline’s origin stage.

Returns:An instance of streamsets.sdk.sdc_models.Stage.
parameters

Get the pipeline parameters.

Returns:A dict of parameter key-value pairs.
pprint()[source]

Pretty-print the pipeline’s JSON representation.

rate_limit

Get the rate limit (records/sec).

Returns:The rate limit as a str.
title

Get the pipeline title.

Returns:The pipeline title as a str.
class streamsets.sdk.sdc_models.Stage(stage, label=None)[source]

Pipeline stage.

Parameters:
  • stage – JSON representation of the pipeline stage.
  • label (str, optional) – Human-readable stage label. Default: None.
configuration

streamsets.sdk.models.Configuration – The stage configuration.

services

dict – If supported by the stage, a dictionary mapping a service name to an instance of streamsets.sdk.models.Configuration.

add_output(*other_stages, event_lane=False)[source]

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters:other_stage (streamsets.sdk.sdc_models.Stage) – Stage object.
Returns:This stage as an instance of streamsets.sdk.sdc_models.Stage).
description

str – The stage’s description.

event_lanes

Get the stage’s list of event lanes.

Returns:A list of event lanes.
label

str – The stage’s label.

library

Get the stage’s library.

Returns:The stage library as a str.
output_lanes

Get the stage’s list of output lanes.

Returns:A list of output lanes.
set_attributes(**attributes)[source]

Set one or more stage attributes.

Parameters:**attributes – Attributes to set.
Returns:This stage as an instance of streamsets.sdk.sdc_models.Stage.
stage_on_record_error

The stage’s on record error configuration value.

stage_record_preconditions

The stage’s record preconditions configuration value.

stage_required_fields

The stage’s required fields configuration value.

Pipeline ACLs

class streamsets.sdk.sdc_models.PipelineAcl(pipeline_acl)[source]

Represents a pipeline ACL.

Parameters:pipeline_acl (dict) – JSON representation of a pipeline ACL.
permissions

streamsets.sdk.sdc_models.PipelinePermissions – Pipeline Permissions object.

Pipeline Permissions

class streamsets.sdk.sdc_models.PipelinePermission(pipeline_permission)[source]

A container for a pipeline permission.

Parameters:pipeline_permission (dict) – A Python object representation of a pipeline permission.
class streamsets.sdk.sdc_models.PipelinePermissions(pipeline_permissions)[source]

Container for list of permissions for a pipeline.

Parameters:pipeline_permissions (dict) – A Python object representation of pipeline permissions.
permissions

list – A list of streamsets.sdk.sdc_models.PipelinePermission instances.

Previews

class streamsets.sdk.sdc_models.Preview(pipeline_id, previewer_id, preview)[source]

Preview.

Parameters:
  • pipeline_id (str) – Pipeline ID.
  • previewer_id (str) – Previewer ID.
  • preview (dict) – Python object representation of the preview.
issues

dict – An instance of streamsets.sdk.sdc_models.Issues.

preview_batches

list – A list of streamsets.sdk.sdc_models.Batch instances.

Snapshots

class streamsets.sdk.sdc_models.Batch(batch)[source]

Snapshot batch.

Parameters:batch – Python object representation of the snapshot batch.
class streamsets.sdk.sdc_models.Record(record)[source]

Record.

Parameters:record (dict) – Python object representation of the record.
header

dict – An instance of streamsets.sdk.sdc_models.RecordHeader.

value

dict – Python object representation of the record value.

value2

A typed representation of the record value.

get_field_attributes(path)[source]

Given a field path string (similar to XPath), get streamsets.sdk.sdc_models.Field attributes. .. rubric:: Example

get_field_attributes(path=’[2]/east/HR/employeeName’).

Parameters:path (str) – field path string.
Returns:streamsets.sdk.sdc_models.Field attributes.
get_field_data(path)[source]

Given a field path string (similar to XPath), get streamsets.sdk.sdc_models.Field. .. rubric:: Example

get_field_data(path=’[2]/east/HR/employeeName’).

Parameters:path (str) – field path string.
Returns:An instance of streamsets.sdk.sdc_models.Field.
class streamsets.sdk.sdc_models.RecordHeader(header)[source]

Record Header.

Parameters:header (dict) – Python object representation of the record header.
class streamsets.sdk.sdc_models.Snapshot(pipeline_id, snapshot_name, snapshot)[source]

Snapshot.

Parameters:
  • pipeline_id (str) – The pipeline ID.
  • snapshot_name (str) – The snapshot name.
  • snapshot (dict) – Python object representation of the snapshot.
snapshot_batches

list – A list of streamsets.sdk.sdc_models.Batch instances.

class streamsets.sdk.sdc_models.StageOutput(stage_output)[source]

Snapshot batch’s stage output.

Parameters:stage_output – Python object representation of the stage output.
output

Gets the stage output’s output.

If the stage contains multiple lanes, use streamsets.sdk.sdc_models.StageOutput.output_lanes.

Raises:An instance of Exception if the stage contains multiple lanes.
Returns:An instance of streamsets.sdk.sdc_models.Record.

Users

class streamsets.sdk.sdc_models.User(user)[source]

User.

Parameters:user (dict) – Python object representation of the user.
groups

Get user’s groups.

Returns:User groups as a str.
name

Get user’s name.

Returns:User name as a str.
roles

Get user’s roles.

Returns:User roles as a str.

StreamSets Transformer

Main interface

This is the main entry point used by users when interacting with Transformer instances.

class streamsets.sdk.Transformer(server_url, username=None, password=None, authentication_method='form', accounts_authentication_token=None, accounts_server_url=None, control_hub=None, dump_log_on_error=False, **kwargs)[source]

Class to interact with StreamSets Transformer.

Parameters:
  • server_url (str) – URL of an existing ST deployment with which to interact.
  • username (str, optional) – ST username. Default: streamsets.sdk.st.DEFAULT_ST_USERNAME.
  • password (str, optional) – ST password. Default: streamsets.sdk.st.DEFAULT_ST_PASSWORD.
  • authentication_method (str, optional) – StreamSets Transformer authentication method. Default: streamsets.sdk.constants.ENGINE_AUTHENTICATION_METHOD_FORM.
  • accounts_authentication_token (str, optional) – StreamSets Accounts authentication token. Default: None
  • accounts_server_url (str, optional) – StreamSets Accounts server base URL. Default: None
  • control_hub (streamsets.sdk.ControlHub, optional) – A StreamSets Control Hub instance to use for SCH-registered Transformers. Default: None.
  • dump_log_on_error (bool) – Whether to output Transformer logs when exceptions are raised by certain methods. Default: False
add_pipeline(*pipelines)[source]

Add one or more pipelines to the Transformer instance.

Parameters:*pipelines – One or more instances of streamsets.sdk.st_models.Pipeline.
capture_snapshot(pipeline, snapshot_name=None, start_pipeline=False, runtime_parameters=None, batches=1, batch_size=10, **kwargs)[source]

Capture a snapshot for given pipeline.

Parameters:
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
  • snapshot_name (str, optional) – Name for the generated snapshot. If set to None, an auto-generated UUID (which can be recovered from the returned SnapshotCommand object’s snapshot_name attribute) will be used when calling the REST API. Default: None.
  • start_pipeline (bool, optional) – If set to true, then the pipeline will be started and its first batch will be captured. Otherwise, the pipeline must be running, in which case one of the next batches will be captured. Default: False.
  • runtime_parameters (dict, optional) – Runtime parameters to override Pipeline Parameters value. Default: None.
  • wait (bool, optional) – Wait for capture snapshot to finish. Default: True.
  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED'].
  • timeout_sec (int) – Timeout to wait for snapshot, in seconds. Default: streamsets.sdk.st.DEFAULT_SNAPSHOT_TIMEOUT.
  • time_between_checks (int, optional) – Time to sleep between snapshot status checks. Applicable when `wait` is enabled, in seconds. Default: streamsets.sdk.st.DEFAULT_SNAPSHOT_TIME_BETWEEN_CHECKS.
Returns:

An instance of streamsets.sdk.st_api.SnapshotCommand.

change_password(old_password, new_password)[source]

Change password for the current user.

Parameters:
  • old_password (str) – old password.
  • new_password (str) – new password.
Returns:

An instance of streamsets.sdk.st_api.Command.

current_user

Get currently logged-in user and its groups and roles.

Returns:An instance of streamsets.sdk.st_models.User.
definitions

Get an ST instance’s definitions.

Will return a cached instance of the definitions if called more than once.

Returns:An instance of json.
get_alerts()[source]

Get pipeline alerts.

Returns:An instance of streamsets.sdk.st_models.Alerts.
get_bundle(generators=None)[source]

Generate new support bundle.

Returns:An instance of zipfile.ZipFile.
get_bundle_generators()[source]

Get available support bundle generators.

Returns:An instance of streamsets.sdk.st_models.BundleGenerators.
get_logs(ending_offset=-1, extra_message=None, pipeline=None, severity=None)[source]

Get logs.

Parameters:
  • ending_offset (int) – ending_offset, Default: -1.
  • extra_message (str) – extra_message, Default: None.
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance, Default: None.
  • severity (str) – severity, Default: None.
Returns:

An instance of streamsets.sdk.st_models.Log.

get_pipeline(pipeline_id)[source]

Get a pipeline.

Parameters:pipeline_id (str) – Id of pipeline.
Returns:An instance of streamsets.sdk.st_models.Pipeline.
get_pipeline_acl(pipeline)[source]

Get pipeline ACL.

Parameters:pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.st_models.PipelineAcl.
get_pipeline_builder(**kwargs)[source]

Get a pipeline builder instance with which a pipeline can be created.

Returns:An instance of streamsets.sdk.st_models.PipelineBuilder.
get_pipeline_history(pipeline)[source]

Get a pipeline’s history.

Parameters:pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.st_models.History.
get_pipeline_permissions(pipeline)[source]

Return pipeline permissions for a given pipeline.

Parameters:pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.st_models.PipelinePermissions.
get_pipeline_status(pipeline)[source]

Get status of a pipeline.

Parameters:pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
get_snapshots(pipeline=None)[source]

Get information about stored snapshots.

Parameters:pipeline (streamsets.sdk.st_models.Pipeline, optional) – The pipeline instance. Default: None.
Returns:A list of streamsets.sdk.st_models.SnapshotInfo instances.
id

Return id for StreamSets Transformer.

Returns:A str Transformer ID.
pipelines

Get all pipelines in the pipeline store.

Returns:
A streamsets.sdk.utils.SeekableList of
streamsets.sdk.st_models.Pipeline instances.
remove_pipeline(*pipelines)[source]

Remove one or more pipelines from the Transformer instance.

Parameters:*pipelines – One or more instances of streamsets.sdk.st_models.Pipeline.
reset_origin(pipeline)[source]

Reset origin offset.

Parameters:pipeline (streamsets.sdk.st_models.Pipeline) – Pipeline object.
Returns:An instance of streamsets.sdk.st_api.Command.
run_pipeline_preview(pipeline, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=120000, stage_outputs_to_override_json=None, **kwargs)[source]

Run pipeline preview.

Parameters:
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
  • rev (int, optional) – Pipeline revision. Default: 0.
  • batches (int, optional) – Number of batches. Default: 1.
  • batch_size (int, optional) – Batch size. Default: 10.
  • skip_targets (bool, optional) – Skip targets. Default: True.
  • end_stage (str, optional) – End stage. Default: None.
  • timeout (int, optional) – Server side preview Timeout in milliseconds. Default: 120000.
  • stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None.
  • remote (bool, optional) – Remote preview (i.e. run on the cluster). Default: False.
  • timeout_sec (int, optional) – Client side preview timeout, in seconds. Default: streamsets.sdk.st.DEFAULT_PREVIEW_CLIENT_TIMEOUT_SEC.
  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True.
  • time_between_checks (int, optional) – Time to sleep between preview status checks. Applicable when `wait` is enabled, in seconds. Default: streamsets.sdk.st.DEFAULT_PREVIEW_TIME_BETWEEN_CHECKS.
Returns:

An instance of streamsets.sdk.st_api.PreviewCommand.

set_pipeline_acl(pipeline, pipeline_acl)[source]

Update pipeline ACL.

Parameters:
Returns:

An instance of streamsets.sdk.st_api.Command.

set_user(username, password=None)[source]

Set the user with which to interact with ST.

Parameters:
  • username (str) – Username of user.
  • password (str, optional) – Password for user. Default: same as username.
start_pipeline(pipeline, runtime_parameters=None, **kwargs)[source]

Start a pipeline.

Parameters:
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
  • runtime_parameters (dict, optional) – Collection of runtime parameters. Default: None.
  • wait (bool, optional) – Wait for pipeline to start. Default: True.
  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED'].
  • timeout_sec (int) – Timeout to wait for pipeline statuses, in seconds. Default: streamsets.sdk.st.DEFAULT_START_TIMEOUT.
Returns:

An instance of streamsets.sdk.st_api.PipelineCommand.

stop_pipeline(pipeline, **kwargs)[source]

Stop a pipeline.

Parameters:
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
  • force (bool, optional) – Force pipeline to stop. Default: False.
  • wait (bool, optional) – Wait for pipeline to stop. Default: True.
  • timeout_sec (int) – Timeout to wait for pipeline stop, in seconds. Default: streamsets.sdk.st.DEFAULT_STOP_TIMEOUT.
Returns:

An instance of streamsets.sdk.st_api.StopPipelineCommand.

transformer_configuration

Return all configurations for StreamSets Transformer. :returns: A dict with property names as keys and property values as values.

validate_pipeline(pipeline, **kwargs)[source]

Validate a pipeline.

Parameters:
  • pipeline (streamsets.sdk.st_models.Pipeline) – The pipeline instance.
  • timeout (int, optional) – Server side validate Timeout in seconds. Default: streamsets.sdk.st.DEFAULT_VALIDATE_SERVER_TIMEOUT_SEC.
  • timeout_sec (int, optional) – Client side validate timeout, in seconds. Default: streamsets.sdk.st.DEFAULT_VALIDATE_CLIENT_TIMEOUT_SEC.
  • time_between_checks (int, optional) – Time to sleep between validation checks. Default: streamsets.sdk.st.DEFAULT_VALIDATE_TIME_BETWEEN_CHECKS.
  • using_configured_cluster_manager (bool, optional) – Validate pipeline using configured cluster manager. Default: True.
version

Return the version of the Transformer.

Returns:The version string.
Return type:str

Models

These models wrap and provide useful functionality for interacting with common SCH abstractions.

Alerts

class streamsets.sdk.st_models.Alert(alert)[source]

Pipeline alert.

Parameters:alert (dict) – Python object representation of a pipeline alert.
alert_texts

Get alert’s alert texts.

Returns:The alert’s alert texts as a str.
label

Get alert’s label.

Returns:The alert’s label as a str.
pipeline_id

Get alert’s pipeline ID.

Returns:The pipeline ID as a str.
class streamsets.sdk.st_models.Alerts(alerts)[source]

Container for list of alerts with filtering capabilities.

Parameters:alerts (dict) – Python object representation of alerts.
alerts

list – A list of streamsets.sdk.st_models.Alert instances.

for_pipeline(pipeline)[source]

Get alerts for the specified pipeline.

Parameters:pipeline (str) – The pipeline for which to get alerts.
Returns:An instance of streamsets.sdk.st_models.Alerts.

Data Rules

class streamsets.sdk.st_models.DataDriftRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text='${alert:info()}', send_email=False, active=False)[source]

Pipeline data drift rule.

Parameters:
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.
  • label (str) – Rule label.
  • condition (str, optional) – Data rule condition. Default: None.
  • sampling_percentage (int, optional) – Default: 5.
  • sampling_records_to_retain (int, optional) – Default: 10.
  • enable_meter (bool, optional) – Default: True.
  • enable_alert (bool, optional) – Default: True.
  • alert_text (str, optional) – Default: '${alert:info()}'.
  • send_email (bool, optional) – Default: False.
  • active (bool, optional) – Enable the data rule. Default: False.
active

The rule is active.

Returns:A bool.
class streamsets.sdk.st_models.DataRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text=None, threshold_type='count', threshold_value=100, min_volume=1000, send_email=False, active=False)[source]

Pipeline data rule.

Parameters:
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.
  • label (str) – Rule label.
  • condition (str, optional) – Data rule condition. Default: None.
  • sampling_percentage (int, optional) – Default: 5.
  • sampling_records_to_retain (int, optional) – Default: 10.
  • enable_meter (bool, optional) – Default: True.
  • enable_alert (bool, optional) – Default: True.
  • alert_text (str, optional) – Default: None.
  • threshold_type (str, optional) – One of count or percentage. Default: 'count'.
  • threshold_value (int, optional) – Default: 100.
  • min_volume (int, optional) – Only set if threshold_type is percentage. Default: 1000.
  • send_email (bool, optional) – Default: False.
  • active (bool, optional) – Enable the data rule. Default: False.
active

Returns if the rule is active or not.

Returns:A bool.

History

class streamsets.sdk.st_models.History(history)[source]

Pipeline history.

Parameters:history (dict) – Python object representation of the pipeline history.
entries

list – A list of streamsets.sdk.st_models.HistoryEntry instances.

latest

Get pipeline history’s latest entry.

Returns:The most recent pipeline history entry as an instance of streamsets.sdk.st_models.HistoryEntry.
class streamsets.sdk.st_models.HistoryEntry(entry)[source]

Pipeline history entry.

Parameters:entry (dict) – Python object representation of the history entry.
metrics

Get pipeline history entry’s metrics.

Returns:The pipeline history entry’s metrics as an instance of streamsets.sdk.st_models.Metrics.

Issues

class streamsets.sdk.st_models.Issue(issue)[source]

Issue encountered for a pipeline or a stage.

Parameters:issue (dict) – Python object representation of the issue.
class streamsets.sdk.st_models.Issues(issues)[source]

Issues encountered for pipelines as well as stages.

Parameters:issues (dict) – Python object representation of the issues.
issues_count

int – The number of issues.

pipeline_issues

list – A list of streamsets.sdk.st_models.Issue instances.

stage_issues

dict – A dictionary mapping stage names to instances of streamsets.sdk.st_models.Issue.

Logs

class streamsets.sdk.st_models.Log(log)[source]

Model for ST logs.

Parameters:log (list) – A list of dictionaries (JSON representation) of the log.
after_time(timestamp)[source]

Returns log happened after the time specified.

Parameters:timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.
Returns:The formatted log as a str.
before_time(timestamp)[source]

Returns log happened before the time specified.

Parameters:timestamp (str) – Timestamp in the form ‘2017-04-10 17:53:55,244’.
Returns:The formatted log as a str.

Metrics

class streamsets.sdk.st_models.MetricCounter(counter)[source]

Metric counter.

Parameters:counter (dict) – Python object representation of a metric counter.
count

Get the metric counter’s count.

Returns:The metric counter’s count as an int.
class streamsets.sdk.st_models.MetricGauge(gauge)[source]

Metric gauge.

Parameters:gauge (dict) – Python object representation of a metric gauge.
value

Get the metric gauge’s value.

Returns:The metric gauge’s value as a str.
class streamsets.sdk.st_models.MetricHistogram(histogram)[source]

Metric histogram.

Parameters:histogram (dict) – Python object representation of a metric histogram.
class streamsets.sdk.st_models.MetricTimer(timer)[source]

Metric timer.

Parameters:timer (dict) – Python object representation of a metric timer.
count

Get the metric timer’s count.

Returns:The metric timer’s count as an int.
class streamsets.sdk.st_models.Metrics(metrics)[source]

Metrics.

Parameters:metrics (dict) – Python object representation of metrics.
counter(name)[source]

Get the metric counter from metrics.

Parameters:name (str) – Counter name.
Returns:The metric counter as an instance of streamsets.sdk.st_models.MetricCounter.
gauge(name)[source]

Get the metric gauge from metrics.

Parameters:name (str) – Gauge name.
Returns:The metric gauge as an instance of streamsets.sdk.st_models.MetricGauge.
histogram(name)[source]

Get the metric histogram from metrics.

Parameters:name (str) – Histogram name.
Returns:The metric histogram as an instance of streamsets.sdk.st_models.MetricHistogram.
timer(name)[source]

Get the metric timer from metrics.

Parameters:name (str) – Timer namer.
Returns:The metric timer as an instance of streamsets.sdk.st_models.MetricTimer.

Pipelines

class streamsets.sdk.st_models.PipelineBuilder(pipeline, definitions)[source]

Class with which to build ST pipelines.

This class allows a user to programmatically generate an ST pipeline. Instead of instantiating this class directly, most users should use streamsets.sdk.Transformer.get_pipeline_builder().

Parameters:
  • pipeline (dict) – Python object representing an empty pipeline. If created manually, this would come from creating a new pipeline in ST and then exporting it before doing any configuration.
  • definitions (dict) – The output of ST’s definitions endpoint.
add_data_drift_rule(*data_drift_rules)[source]

Add one or more data drift rules to the pipeline.

Parameters:*data_drift_rules – One or more instances of streamsets.sdk.st_models.DataDriftRule.
add_data_rule(*data_rules)[source]

Add one or more data rules to the pipeline.

Parameters:*data_rules – One or more instances of streamsets.sdk.st_models.DataRule.
add_error_stage(label=None, name=None, library=None)[source]

Add an error stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.st_models.Stage.

add_metric_rule(*metric_rules)[source]

Add one or more metric rules to the pipeline.

Parameters:*data_rules – One or more instances of streamsets.sdk.st_models.MetricRule.
add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.
  • type (str, optional) – ST stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.st_models.Stage.

add_start_event_stage(label=None, name=None, library=None)[source]

Add start event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.st_models.Stage.

add_stats_aggregator_stage(label=None, name=None, library=None)[source]

Add a stats aggregator stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.st_models.Stage.

add_stop_event_stage(label=None, name=None, library=None)[source]

Add stop event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – ST stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – ST stage name to use when selecting stage from definitions. Default: None.
  • library (str, optional) – ST stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.st_models.Stage.

build(title='Pipeline')[source]

Build the pipeline.

Parameters:title (str, optional) – Pipeline title to use. Default: 'Pipeline'.
Returns:An instance of streamsets.sdk.st_models.Pipeline.
import_pipeline(pipeline, **kwargs)[source]

Import a pipeline into the PipelineBuilder.

Parameters:pipeline (dict) – Exported pipeline.
Returns:An instance of streamsets.sdk.st_models.PipelineBuilder.
class streamsets.sdk.st_models.Pipeline(pipeline, all_stages=None)[source]

ST pipeline.

This class provides abstractions to make it easier to interact with a pipeline before it’s imported into ST.

Parameters:
  • pipeline (dict) – A Python object representing the serialized pipeline.
  • all_stages (dict, optional) – A dictionary mapping stage names to streamsets.sdk.st_models.Stage instances. Default: None.
add_parameters(**parameters)[source]

Add pipeline parameters.

Parameters:**parameters – Keyword arguments to add.
configuration

Get pipeline’s configuration.

Returns:An instance of streamsets.sdk.models.Configuration.
delivery_guarantee

Get the delivery guarantee.

Returns:The delivery guarantee as a str.
id

Get the pipeline id.

Returns:The pipeline id as a str.
metadata

Get the pipeline metadata.

Returns:Pipeline metadata as a Python object.
origin_stage

Get the pipeline’s origin stage.

Returns:An instance of streamsets.sdk.st_models.Stage.
parameters

Get the pipeline parameters.

Returns:A dict of parameter key-value pairs.
pprint()[source]

Pretty-print the pipeline’s JSON representation.

rate_limit

Get the rate limit (records/sec).

Returns:The rate limit as a str.
title

Get the pipeline title.

Returns:The pipeline title as a str.
class streamsets.sdk.st_models.Stage(stage, label=None)[source]

Pipeline stage.

Parameters:
  • stage – JSON representation of the pipeline stage.
  • label (str, optional) – Human-readable stage label. Default: None.
configuration

streamsets.sdk.models.Configuration – The stage configuration.

services

dict – If supported by the stage, a dictionary mapping a service name to an instance of streamsets.sdk.models.Configuration.

add_output(*other_stages, event_lane=False)[source]

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters:other_stage (streamsets.sdk.st_models.Stage) – Stage object.
Returns:This stage as an instance of streamsets.sdk.st_models.Stage).
event_lanes

Get the stage’s list of event lanes.

Returns:A list of event lanes.
label

str – The stage’s label.

library

Get the stage’s library.

Returns:The stage library as a str.
output_lanes

Get the stage’s list of output lanes.

Returns:A list of output lanes.
set_attributes(**attributes)[source]

Set one or more stage attributes.

Parameters:**attributes – Attributes to set.
Returns:This stage as an instance of streamsets.sdk.st_models.Stage.
stage_on_record_error

The stage’s on record error configuration value.

stage_record_preconditions

The stage’s record preconditions configuration value.

stage_required_fields

The stage’s required fields configuration value.

Pipeline ACLs

class streamsets.sdk.st_models.PipelineAcl(pipeline_acl)[source]

Represents a pipeline ACL.

Parameters:pipeline_acl (dict) – JSON representation of a pipeline ACL.
permissions

streamsets.sdk.st_models.PipelinePermissions – Pipeline Permissions object.

Pipeline Permissions

class streamsets.sdk.st_models.PipelinePermission(pipeline_permission)[source]

A container for a pipeline permission.

Parameters:pipeline_permission (dict) – A Python object representation of a pipeline permission.
class streamsets.sdk.st_models.PipelinePermissions(pipeline_permissions)[source]

Container for list of permissions for a pipeline.

Parameters:pipeline_permissions (dict) – A Python object representation of pipeline permissions.
permissions

list – A list of streamsets.sdk.st_models.PipelinePermission instances.

Previews

class streamsets.sdk.st_models.Preview(pipeline_id, previewer_id, preview)[source]

Preview.

Parameters:
  • pipeline_id (str) – Pipeline ID.
  • previewer_id (str) – Previewer ID.
  • preview (dict) – Python object representation of the preview.
issues

dict – An instance of streamsets.sdk.st_models.Issues.

preview_batches

list – A list of streamsets.sdk.st_models.Batch instances.

Snapshots

class streamsets.sdk.st_models.Batch(batch)[source]

Snapshot batch.

Parameters:batch – Python object representation of the snapshot batch.
class streamsets.sdk.st_models.Record(record)[source]

Record.

Parameters:record (dict) – Python object representation of the record.
header

dict – An instance of streamsets.sdk.st_models.RecordHeader.

value

dict – Python object representation of the record value.

value2

A typed representation of the record value.

get_field_attributes(path)[source]

Given a field path string (similar to XPath), get streamsets.sdk.st_models.Field attributes. .. rubric:: Example

get_field_attributes(path=’[2]/east/HR/employeeName’).

Parameters:path (str) – field path string.
Returns:streamsets.sdk.st_models.Field attributes.
get_field_data(path)[source]

Given a field path string (similar to XPath), get streamsets.sdk.st_models.Field. .. rubric:: Example

get_field_data(path=’[2]/east/HR/employeeName’).

Parameters:path (str) – field path string.
Returns:An instance of streamsets.sdk.st_models.Field.
class streamsets.sdk.st_models.RecordHeader(header)[source]

Record Header.

Parameters:header (dict) – Python object representation of the record header.
class streamsets.sdk.st_models.Snapshot(pipeline_id, snapshot_name, snapshot)[source]

Snapshot.

Parameters:
  • pipeline_id (str) – The pipeline ID.
  • snapshot_name (str) – The snapshot name.
  • snapshot (dict) – Python object representation of the snapshot.
snapshot_batches

list – A list of streamsets.sdk.st_models.Batch instances.

class streamsets.sdk.st_models.StageOutput(stage_output)[source]

Snapshot batch’s stage output.

Parameters:stage_output – Python object representation of the stage output.
output

Gets the stage output’s output.

If the stage contains multiple lanes, use streamsets.sdk.st_models.StageOutput.output_lanes.

Raises:An instance of Exception if the stage contains multiple lanes.
Returns:An instance of streamsets.sdk.st_models.Record.

Users

class streamsets.sdk.st_models.User(user)[source]

User.

Parameters:user (dict) – Python object representation of the user.
groups

Get user’s groups.

Returns:User groups as a str.
name

Get user’s name.

Returns:User name as a str.
roles

Get user’s roles.

Returns:User roles as a str.

StreamSets Control Hub

Main interface

This is the main entry point used by users when interacting with SCH instances.

class streamsets.sdk.ControlHub(server_url, username, password)[source]

Class to interact with StreamSets Control Hub.

Parameters:
  • server_url (str) – SCH server base URL.
  • username (str) – SCH username.
  • password (str) – SCH password.
acknowledge_deployment_error(*deployments)[source]

Acknowledge errors for one or more deployments.

Parameters:*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.
acknowledge_event_subscription_error(subscription)[source]

Acknowledge an error on given Event Subscription.

Parameters:subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns:An instance of streamsets.sdk.sch_api.Command.
acknowledge_job_error(*jobs)[source]

Acknowledge errors for one or more jobs.

Parameters:*jobs – One or more instances of streamsets.sdk.sch_models.Job.
acknowledge_topology_errors(topology)[source]

Acknowledge errors of a topology.

Parameters:topology (streamsets.sdk.sch_models.Topology) – Topology object.
action_audits

Action Audits.

Returns:An instance of streamsets.sdk.sch_models.ActionAudits.
activate_datacollector(data_collector)[source]

Activate data collector.

Parameters:data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
activate_provisioning_agent(provisioning_agent)[source]

Activate provisioning agent.

Parameters:provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns:An instance of streamsets.sdk.sch_api.Command.
add_classification_rule(classification_rule, commit=False)[source]

Add a classification rule.

Parameters:
add_connection(connection)[source]

Add a connection.

Parameters:connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns:An instance of streamsets.sdk.sch_api.Command.
add_deployment(deployment)[source]

Add a deployment.

Parameters:deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.
Returns:An instance of streamsets.sdk.sch_api.Command.
add_group(group)[source]

Add a group.

Parameters:group (streamsets.sdk.sch_models.Group) – Group object.
Returns:An instance of streamsets.sdk.sch_api.Command.
add_job(job)[source]

Add a job.

Parameters:job (streamsets.sdk.sch_models.Job) – Job object.
Returns:An instance of streamsets.sdk.sch_models.Job.
add_organization(organization)[source]

Add an organization.

Parameters:organization (streamsets.sdk.sch_models.Organization) – Organization object.
Returns:An instance of streamsets.sdk.sch_api.Command.
add_protection_policy(protection_policy)[source]

Add a protection policy.

Parameters:protection_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Protection Policy object.
add_report_definition(report_definition)[source]

Add Report Definition to Control Hub.

Parameters:report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns:An instance of streamsets.sdk.sch_api.Command.
add_subscription(subscription)[source]

Add Subscription to Control Hub.

Parameters:subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns:An instance of streamsets.sdk.sch_api.Command.
add_user(user)[source]
Add a user. Some user attributes are updated by SCH such as
created_by, created_on, last_modified_by, last_modified_on, password_expires_on, password_system_generated.
Parameters:user (streamsets.sdk.sch_models.User) – User object.
Returns:An instance of streamsets.sdk.sch_api.Command.
balance_data_collectors(*data_collectors)[source]

Balance all jobs running on given Data Collectors.

Parameters:*sdcs – One or more instances of streamsets.sdk.sch_models.DataCollector.
balance_job(*jobs)[source]

Balance one or more jobs.

Parameters:*jobs – One or more instances of streamsets.sdk.sch_models.Job.
connection_audits

Connection Audits.

Returns:An instance of streamsets.sdk.sch_models.ConnectionAudits.
connection_tags

Connection Tags.

Returns:An instance of streamsets.sdk.sch_models.ConnectionTags.
connections

Connections.

Returns:An instance of streamsets.sdk.sch_models.Connections.
create_components(component_type, number_of_components=1, active=True)[source]

Create components.

Parameters:
  • component_type (str) – Component type.
  • number_of_components (int, optional) – Default: 1.
  • active (bool, optional) – Default: True.
Returns:

An instance of streamsets.sdk.sch_api.CreateComponentsCommand.

create_topology(topology)[source]

Create a topology.

Parameters:topology (streamsets.sdk.sch_models.Topology) – Topology object.
Returns:An instance of streamsets.sdk.sch_models.Topology.
data_collectors

Data Collectors registered to the Control Hub instance.

Returns:Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.DataCollector instances.
data_protector_enabled

bool – Whether Data Protector is enabled for the current organization.

data_protector_version

str – Returns the StreamSets Data Protector version string configured in the system data collector. If data protector is not enabled a None value is returned

deactivate_datacollector(data_collector)[source]

Deactivate data collector.

Parameters:data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
deactivate_provisioning_agent(provisioning_agent)[source]

Deactivate provisioning agent.

Parameters:provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns:An instance of streamsets.sdk.sch_api.Command.
deactivate_user(*users, organization=None)[source]

Deactivate Users for all given User IDs.

Parameters:
  • *users – One or more instances of streamsets.sdk.sch_models.User.
  • organization (str, optional) – Default: None. If not specified, current organization will be used.
Returns:

An instance of streamsets.sdk.sch_api.Command.

delete_and_unregister_data_collector(data_collector)[source]

Delete and Unregister data collector.

Parameters:data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
delete_connection(*connections)[source]

Delete connections.

Parameters:*connections – One or more instances of streamsets.sdk.sch_models.Connection.
delete_data_collector(data_collector)[source]

Delete data collector.

Parameters:data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
delete_deployment(*deployments)[source]

Delete deployments.

Parameters:*deployments – One or more instances of streamsets.sdk.sch_models.Deployment.
Returns:An instance of streamsets.sdk.sch_api.Command.
delete_group(*groups)[source]

Delete groups.

Parameters:*groups – One or more instances of streamsets.sdk.sch_models.Group.
Returns:An instance of streamsets.sdk.sch_api.Command.
delete_job(*jobs)[source]

Delete one or more jobs.

Parameters:*jobs – One or more instances of streamsets.sdk.sch_models.Job.
delete_pipeline(pipeline, only_selected_version=False)[source]

Delete a pipeline.

Parameters:
Returns:

An instance of streamsets.sdk.sch_api.Command.

delete_pipeline_labels(*pipeline_labels)[source]

Delete pipeline labels.

Parameters:*pipeline_labels – One or more instances of streamsets.sdk.sch_models.PipelineLabel.
Returns:An instance of streamsets.sdk.sch_api.Command.
delete_provisioning_agent(provisioning_agent)[source]

Delete provisioning agent.

Parameters:provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns:An instance of streamsets.sdk.sch_api.Command.
delete_provisioning_agent_token(provisioning_agent)[source]

Delete provisioning agent token.

Parameters:provisioning_agent (streamets.sdk.sch_models.ProvisioningAgent) –
Returns:An instance of streamsets.sdk.sch_api.Command.
delete_report_definition(report_definition)[source]

Delete an existing Report Definition.

Parameters:report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns:An instance of streamsets.sdk.sch_api.Command.
delete_subscription(subscription)[source]

Delete an exisiting Subscription.

Parameters:subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns:An instance of streamsets.sdk.sch_api.Command.
delete_topology(topology, only_selected_version=False)[source]

Delete a topology.

Parameters:
Returns:

An instance of streamsets.sdk.sch_api.Command.

delete_user(*users, deactivate=False)[source]

Delete users. Deactivate users before deleting if configured.

Parameters:
Returns:

An instance of streamsets.sdk.sch_api.Command.

deployments

Deployments.

Returns:An instance of streamsets.sdk.sch_models.Deployments.
duplicate_job(job, name=None, description=None, number_of_copies=1)[source]

Duplicate an existing job.

Parameters:
  • job (streamsets.sdk.sch_models.Job) – Job object.
  • name (str, optional) – Name of the new job(s). Default: None. If not specified, name of the job with ' copy' appended to the end will be used.
  • description (str, optional) – Description for new job(s). Default: None.
  • number_of_copies (int, optional) – Number of copies. Default: 1.
Returns:

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

duplicate_pipeline(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]

Duplicate an existing pipeline.

Parameters:
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
  • name (str, optional) – Name of the new pipeline(s). Default: None.
  • description (str, optional) – Description for new pipeline(s). Default: 'New Pipeline'.
  • number_of_copies (int, optional) – Number of copies. Default: 1.
Returns:

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

edit_job(job)[source]

Edit a job.

Parameters:job (streamsets.sdk.sch_models.Job) – Job object.
Returns:An instance of streamsets.sdk.sch_models.Job.
edit_topology(topology)[source]

Edit a topology.

Parameters:topology (streamsets.sdk.sch_models.Topology) – Topology object.
Returns:An instance of streamsets.sdk.sch_models.Topology.
export_jobs(jobs)[source]

Export jobs to a compressed archive.

Parameters:jobs (list) – A list of streamsets.sdk.sch_models.Job instances.
Returns:An instance of type bytes indicating the content of zip file with job json files.
export_pipelines(pipelines, fragments=False, include_plain_text_credentials=False)[source]

Export pipelines.

Parameters:
  • pipelines (list) – A list of streamsets.sdk.sch_models.Pipeline instances.
  • fragments (bool) – Indicates if exporting fragments is needed.
  • include_plain_text_credentials (bool) – Indicates if plain text credentials should be included.
Returns:

An instance of type bytes indicating the content of zip file with pipeline json files.

export_protection_policies(protection_policies)[source]

Export protection policies to a compressed archive.

Parameters:
Returns:

An instance of type bytes indicating the content of zip file with protection policy json files.

export_topologies(topologies)[source]

Export topologies.

Parameters:topologies (list) – A list of streamsets.sdk.sch_models.Topology instances.
Returns:An instance of type bytes indicating the content of zip file with pipeline json files.
get_admin_tool(base_url, username, password)[source]

Get SCH admin tool.

Returns:An instance of streamsets.sdk.sch_models.AdminTool.
get_classification_rule_builder()[source]

Get a classification rule builder instance with which a classification rule can be created.

Returns:An instance of streamsets.sdk.sch_models.ClassificationRuleBuilder.
get_components(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]

Get components.

Parameters:
  • component_type_id (str) – Component type id.
  • offset (str, optional) – Default: None.
  • len (str, optional) – Default: None.
  • order_by (str, optional) – Default: 'LAST_VALIDATED_ON'.
  • order (str, optional) – Default: 'ASC'.
get_connection_builder()[source]

Get a connection builder instance with which a connection can be created.

Returns:An instance of streamsets.sdk.sch_models.ConnectionBuilder.
get_current_job_status(job)[source]

Returns the current job status for given job id.

Parameters:job (streamsets.sdk.sch_models.Job) – Job object.
get_data_collector_labels(data_collector)[source]

Returns all labels assigned to data collector.

Parameters:data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
Returns:A list of data collector assigned labels.
get_deployment_builder()[source]

Get a deployment builder instance with which a deployment can be created.

Returns:An instance of streamsets.sdk.sch_models.DeploymentBuilder.
get_group_builder()[source]

Get a group builder instance with which a group can be created.

Returns:An instance of streamsets.sdk.sch_models.GroupBuilder.
get_job_builder()[source]

Get a job builder instance with which a job can be created.

Returns:An instance of streamsets.sdk.sch_models.JobBuilder.
get_organization_builder()[source]

Get an organization builder instance with which an organization can be created.

Returns:An instance of streamsets.sdk.sch_models.OrganizationBuilder.
get_pipeline_builder(data_collector=None, transformer=None, fragment=False)[source]

Get a pipeline builder instance with which a pipeline can be created.

Parameters:
  • data_collector (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to author the pipeline. If omitted, Control Hub’s system SDC will be used. Default: None.
  • transformer (streamsets.sdk.sch_models.Transformer, optional) – The Transformer in which to author the pipeline.
  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.
Returns:

An instance of streamsets.sdk.sch_models.PipelineBuilder or streamsets.sdk.sch_models.StPipelineBuilder.

get_protection_method_builder()[source]

Get a protection method builder instance with which a protection method can be created.

Returns:An instance of streamsets.sdk.sch_models.ProtectionMethodBuilder.
get_protection_policy_builder()[source]

Get a protection policy builder instance with which a protection policy can be created.

Returns:An instance of streamsets.sdk.sch_models.ProtectionPolicyBuilder.
get_report_definition_builder()[source]

Get a Report Definition Builder instance with which a Report Definition can be created.

Returns:An instance of streamsets.sdk.sch_models.ReportDefinitionBuilder.
get_scheduled_task_builder()[source]

Get a scheduled task builder instance with which a scheduled task can be created.

Returns:An instance of streamsets.sdk.sch_models.ScheduledTaskBuilder.
get_subscription_builder()[source]

Get Event Subscription Builder.

Returns:An instance of streamsets.sdk.sch_models.SubscriptionBuilder.
get_topology_builder()[source]

Get a topology builder instance with which a topology can be created.

Returns:An instance of streamsets.sdk.sch_models.TopologyBuilder.
get_topology_jobs(topology)[source]

Get jobs for given topology.

Parameters:topology (streamsets.sdk.sch_models.Topology) – Topology object.
Returns:A list of streamsets.sdk.sch_models.Job instances.
get_user_builder()[source]

Get a user builder instance with which a user can be created.

Returns:An instance of streamsets.sdk.sch_models.UserBuilder.
groups

Groups.

Returns:An instance of streamsets.sdk.sch_models.Groups.
import_jobs(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]

Import jobs from archived zip directory.

Parameters:
  • archive (file) – file containing the jobs.
  • pipeline (boolean, optional) – Indicate if pipeline should be imported. Default: True.
  • number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.
  • labels (boolean, optional) – Indicate if labels should be imported. Default: False.
  • runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.
Returns:

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.

import_pipeline(pipeline, commit_message, name=None, data_collector_instance=None)[source]

Import pipeline from json file.

Parameters:
  • pipeline (dict) – A python dict representation of ControlHub Pipeline.
  • commit_message (str) – Commit message.
  • name (str, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. Default None.
  • data_collector_instance (streamsets.sdk.sch_models.DataCollector) – If excluded, system sdc will be used. Default None.
Returns:

An instance of streamsets.sdk.sch_models.Pipeline.

import_pipelines_from_archive(archive, commit_message, fragments=False)[source]

Import pipelines from archived zip directory.

Parameters:
  • archive (file) – file containing the pipelines.
  • commit_message (str) – Commit message.
  • fragments (bool, optional) – Indicates if pipeline contains fragments.
Returns:

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Pipeline.

import_protection_policies(policies_archive)[source]

Import protection policies from a compressed archive.

Parameters:policies_archive (file) – file containing the protection policies.
Returns:A py:class:streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ProtectionPolicy.
import_topologies(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]

Import topologies from archived zip directory.

Parameters:
  • archive (file) – file containing the topologies.
  • import_number_of_instances (boolean, optional) – Indicate if number of instances should be imported. Default: False.
  • import_labels (boolean, optional) – Indicate if labels should be imported. Default: False.
  • import_runtime_parameters (boolean, optional) – Indicate if runtime parameters should be imported. Default: False.
Returns:

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Topology.

jobs

Jobs.

Returns:An instance of streamsets.sdk.sch_models.Jobs.
ldap_enabled

Indication if LDAP is enabled or not.

Returns:An instance of boolean.
login_audits

Login Audits.

Returns:An instance of streamsets.sdk.sch_models.LoginAudits.
organizations

Organizations.

Returns:An instance of streamsets.sdk.sch_models.Organizations.
pipeline_labels

Pipeline labels.

Returns:An instance of streamsets.sdk.sch_models.PipelineLabels.
pipelines

Pipelines.

Returns:An instance of streamsets.sdk.sch_models.Pipelines.
preview_classification_rule(classification_rule, parameter_data, data_collector=None)[source]

Dynamic preview of a classification rule.

Parameters:
Returns:

An instance of streamsets.sdk.sdc_api.PreviewCommand.

protection_policies

Protection policies.

Returns:An instance of streamsets.sdk.sch_models.ProtectionPolicies.
provisioning_agents

Provisioning Agents registered to the Control Hub instance.

Returns:An instance of streamsets.sdk.sch_models.ProvisioningAgents.
publish_pipeline(pipeline, commit_message='New pipeline', draft=False)[source]

Publish a pipeline.

Parameters:
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
  • commit_message (str, optional) – Default: 'New pipeline'.
  • draft (boolean, optional) – Default: False.
publish_scheduled_task(task)[source]

Send the scheduled task to Control Hub.

Parameters:task (streamsets.sdk.sch_models.ScheduledTask) – Scheduled task object.
Returns:An instance of streamsets.sdk.sch_api.Command.
publish_topology(topology)[source]

Public a topology.

Parameters:topology (streamsets.sdk.sch_models.Topology) – Topology object.
Returns:An instance of streamsets.sdk.sch_models.Topology.
report_definitions

Report Definitions.

Returns:An instance of streamsets.sdk.sch_models.ReportDefinitions.
reset_origin(*jobs)[source]

Reset all pipeline offsets for given jobs.

Parameters:*jobs – One or more instances of streamsets.sdk.sch_models.Job.
Returns:A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.
run_pipeline_preview(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]

Run pipeline preview.

Parameters:
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
  • batches (int, optional) – Number of batches. Default: 1.
  • batch_size (int, optional) – Batch size. Default: 10.
  • skip_targets (bool, optional) – Skip targets. Default: True.
  • skip_lifecycle_events (bool, optional) – Skip life cycle events. Default: True.
  • timeout (int, optional) – Timeout. Default: 120000.
  • test_origin (bool, optional) – Test origin. Default: False.
  • read_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Read Policy for preview. If not provided, uses default read policy if one available. Default: None.
  • write_policy (streamsets.sdk.sch_models.ProtectionPolicy) – Write Policy for preview. If not provided, uses default write policy if one available. Default: None.
  • executor (streamsets.sdk.sch_models.DataCollector, optional) – The Data Collector in which to preview the pipeline. If omitted, Control Hub’s first executor SDC will be used. Default: None.
Returns:

An instance of streamsets.sdk.sdc_api.PreviewCommand.

scale_deployment(deployment, num_instances)[source]

Scale up/down active deployment.

Parameters:
Returns:

An instance of streamsets.sdk.sch_api.Command.

scheduled_tasks

Scheduled Tasks.

Returns:An instance of streamsets.sdk.sch_models.ScheduledTasks.
set_default_read_protection_policy(protection_policy)[source]

Set a default read protection policy.

Parameters:protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default read policy.:

Returns:An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.
Raises:UnsupportedMethodError – Only supported on Control Hub version 3.14.0+.
set_default_write_protection_policy(protection_policy)[source]

Set a default write protection policy.

Parameters:protection_policy

:param (streamsets.sdk.sch_models.ProtectionPolicy): Protection :param Policy object to be set as the default write policy.:

Returns:An updated instance of streamsets.sdk.sch_models.ProtectionPolicy.
Raises:UnsupportedMethodError – Only supported on Control Hub version 3.14.0+.
set_user(username, password)[source]

Set the user by which subsequent actions will be run.

Parameters:
  • username (str) – SCH username.
  • password (str) – SCH password.
Returns:

An instance of streamsets.sdk.sch_api.Command.

start_all_topology_jobs(topology)[source]

Start all jobs of a topology.

Parameters:topology (streamsets.sdk.sch_models.Topology) – Topology object.
start_deployment(deployment, **kwargs)[source]

Start Deployment.

Parameters:
  • deployment (streamsets.sdk.sch_models.Deployment) – Deployment instance.
  • wait (bool, optional) – Wait for deployment to start. Default: True.
  • wait_for_statuses (list, optional) – Deployment statuses to wait on. Default: ['ACTIVE'].
Returns:

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

start_job(*jobs, wait=True, **kwargs)[source]

Start one or more jobs.

Parameters:
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.
  • wait (bool, optional) – Wait for pipelines to reach RUNNING status before returning. Default: True.
  • **kwargs – Arbitrary keyword arguments.
Returns:

An instance of streamsets.sdk.sch_api.StartJobsCommand.

start_job_template(job_template, instance_name_suffix='COUNTER', parameter_name=None, runtime_parameters=None, number_of_instances=1, wait_for_data_collectors=False)[source]

Start Job instances from a Job Template.

Parameters:
  • job_template (streamsets.sdk.sch_models.Job) – A Job instance with the property job_template set to True.
  • instance_name_suffix (str, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default: COUNTER.
  • parameter_name (str, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default: None.
  • runtime_parameters (dict) or (list) – Runtime Parameters to be used in the jobs. If a dict is specified, number_of_instances jobs will be started. If a list is specified, number_of_instances is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default: None.
  • number_of_instances (int, optional) – Number of instances to be started using given parameters. Default: 1.
  • wait_for_data_collectors (bool, optional) – Default: False.
Returns:

A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job instances.

stop_all_topology_jobs(topology, force=False)[source]

Stop all jobs of a topology.

Parameters:
stop_deployment(deployment, wait_for_statuses=['INACTIVE'])[source]

Stop Deployment.

Parameters:
Returns:

An instance of streamsets.sdk.sch_api.DeploymentStartStopCommand.

stop_job(*jobs, force=False, timeout_sec=300)[source]

Stop one or more jobs.

Parameters:
  • *jobs – One or more instances of streamsets.sdk.sch_models.Job.
  • force (bool, optional) – Force job to stop. Default: False.
  • timeout_sec (int, optional) – Timeout in secs. Default: 300.
stop_test_pipeline_run(start_pipeline_command)[source]

Stop the test run of pipeline.

Parameters:start_pipeline_command (streamsets.sdk.sdc_api.StartPipelineCommand) –
Returns:An instance of streamsets.sdk.sdc_api.StopPipelineCommand.
subscriptions

Event Subscriptions.

Returns:An instance of streamsets.sdk.sch_models.Subscriptions.
sync_job(*jobs)[source]

Sync one or more jobs.

Parameters:*jobs – One or more instances of streamsets.sdk.sch_models.Job.
test_pipeline_run(pipeline, reset_origin=False, parameters=None)[source]

Test run a pipeline.

Parameters:
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
  • reset_origin (boolean, optional) – Default: False.
  • parameters (dict, optional) – Pipeline parameters. Default: None.
Returns:

An instance of streamsets.sdk.sdc_api.StartPipelineCommand.

topologies

Topologies.

Returns:An instance of streamsets.sdk.sch_models.Topologies.
transformers

Transformers registered to the Control Hub instance.

Returns:Returns a streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Transformer instances.
update_connection(connection)[source]

Update a connection.

Parameters:connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns:An instance of streamsets.sdk.sch_api.Command.
update_data_collector_labels(data_collector)[source]

Update data collector labels.

Parameters:data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
Returns:An instance of streamsets.sdk.sch_models.DataCollector.
update_data_collector_resource_thresholds(data_collector, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]

Updates data collector resource thresholds.

Parameters:
  • data_collector (streamsets.sdk.sch_models.DataCollector) – Data Collector object.
  • max_cpu_load (float, optional) – Max CPU load in percentage. Default: None.
  • max_memory_used (int, optional) – Max memory used in MB. Default: None.
  • max_pipelines_running (int, optional) – Max pipelines running. Default: None.
Returns:

An instance of streamsets.sdk.sch_api.Command.

update_deployment(deployment)[source]

Update a deployment.

Parameters:deployment (streamsets.sdk.sch_models.Deployment) – Deployment object.
Returns:An instance of streamsets.sdk.sch_api.Command.
update_group(group)[source]

Update a group.

Parameters:group (streamsets.sdk.sch_models.Group) – Group object.
Returns:An instance of streamsets.sdk.sch_api.Command.
update_job(job)[source]

Update a job.

Parameters:job (streamsets.sdk.sch_models.Job) – Job object.
Returns:An instance of streamsets.sdk.sch_models.Job.
update_pipelines_with_different_fragment_version(pipelines, from_fragment_version, to_fragment_version)[source]

Update pipelines with latest pipeline fragment commit version.

Parameters:
  • pipelines (list) – List of streamsets.sdk.sch_models.Pipeline instances.
  • from_fragment_version (streamsets.sdk.sch_models.PipelineCommit) – commit of fragment from which the pipeline needs to be updated.
  • to_fragment_version (streamsets.sdk.sch_models.PipelineCommit) – commit of fragment to which the pipeline needs to be updated.
Returns:

An instance of streamsets.sdk.sch_api.Command

update_report_definition(report_definition)[source]

Update an existing Report Definition.

Parameters:report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition instance.
Returns:An instance of streamsets.sdk.sch_api.Command.
update_subscription(subscription)[source]

Update an existing Subscription.

Parameters:subscription (streamsets.sdk.sch_models.Subscription) – A Subscription instance.
Returns:An instance of streamsets.sdk.sch_api.Command.
update_user(user)[source]
Update a user. Some user attributes are updated by SCH such as
last_modified_by, last_modified_on.
Parameters:user (streamsets.sdk.sch_models.User) – User object.
Returns:An instance of streamsets.sdk.sch_api.Command.
upgrade_job(*jobs)[source]

Upgrade job(s) to latest pipeline version.

Parameters:*jobs – One or more instances of streamsets.sdk.sch_models.Job.
Returns:A streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.Job.
upload_offset(job, offset_file=None, offset_json=None)[source]

Upload offset for given job.

Parameters:
  • job (streamsets.sdk.sch_models.Job) – Job object.
  • offset_file (file, optional) – File containing the offsets. Default: None. Exactly one of offset_file, offset_json should specified.
  • offset_json (dict, optional) – Contents of offset. Default: None. Exactly one of offset_file, offset_json should specified.
Returns:

An instance of streamsets.sdk.sch_models.Job.

users

Users.

Returns:An instance of streamsets.sdk.sch_models.Users.
verify_connection(connection)[source]

Verify connection.

Parameters:connection (streamsets.sdk.sch_models.Connection) – Connection object.
Returns:An instance of streamsets.sdk.sch_models.ConnectionVerificationResult.
wait_for_job_status(job, status, timeout_sec=200)[source]

Block until a job reaches the desired status.

Parameters:
  • job (streamsets.sdk.sch_models.Job) – The job instance.
  • status (str) – The desired status to wait for.
  • timeout_sec (int, optional) – Timeout to wait for job to reach status, in seconds. Default: streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT.
Raises:

TimeoutError – If timeout_sec passes without job reaching status.

Models

These models wrap and provide useful functionality for interacting with common SCH abstractions.

ACLs

class streamsets.sdk.sch_models.ACL(acl, control_hub)[source]

Represents an ACL.

Parameters:
  • acl (dict) – JSON representation of an ACL.
  • control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.
permissions

streamsets.sdk.sch_models.Permissions – A Collection of Permissions.

add_permission(permission)[source]

Add new permission to the ACL.

Parameters:permission (streamsets.sdk.sch_models.Permission) – A permission object.
Returns:An instance of streamsets.sdk.sch_api.Command
permission_builder

Get a permission builder instance with which a pipeline can be created.

Returns:An instance of streamsets.sdk.sch_models.ACLPermissionBuilder.
remove_permission(permission)[source]

Remove a permission from ACL.

Parameters:permission (streamsets.sdk.sch_models.Permission) – A permission object.
Returns:An instance of streamsets.sdk.sch_api.Command
class streamsets.sdk.sch_models.ACLPermissionBuilder(permission, acl)[source]

Class to help build the ACL permission.

Parameters:
build(subject_id, subject_type, actions)[source]

Method to help build the ACL permission.

Parameters:
  • subject_id (str) – Id of the subject e.g. ‘test@test’.
  • subject_type (str) – Type of the subject e.g. ‘USER’.
  • actions (list) – A list of actions of type str e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].
Returns:

An instance of streamsets.sdk.sch_models.Permission.

class streamsets.sdk.sch_models.Permission(permission, resource_type, api_client)[source]

A container for a permission.

Parameters:
  • permission (dict) – A Python object representation of a permission.
  • resource_type (str) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’.
  • api_client (streamsets.sdk.sch_api.ApiClient) – An instance of ApiClient.
resource_id

str – Id of the resource e.g. Pipeline or Job.

subject_id

str – Id of the subject e.g. user id 'admin@admin'.

subject_type

str – Type of the subject e.g. 'USER'.

last_modified_by

str – User who last modified this permission e.g. 'admin@admin'.

last_modified_on

int – Timestamp at which this permission was last modified e.g. 1550785079811.

Classifiers

class streamsets.sdk.sch_models.Classifier(classifier)[source]

Classifier model.

Parameters:classifier (dict) – A Python dict representation of classifier.

Classification Rules

class streamsets.sdk.sch_models.ClassificationRule(classification_rule, classifiers)[source]

Classification Rule Model.

Parameters:
class streamsets.sdk.sch_models.ClassificationRuleBuilder(classification_rule, classifier)[source]

Class with which to build instances of streamsets.sdk.sch_models.ClassificationRule.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_classification_rule_builder().
Parameters:
  • classification_rule (dict) – Python object defining a classification rule.
  • classifier (dict) – Python object defining a classifier.
add_classifier(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]

Add classifier to the classification rule.

Parameters:
  • patterns (list, optional) – List of strings of patterns. Default: None.
  • match_with (str, optional) – Default: None.
  • regular_expression_type (str, optional) – Default: 'RE2/J'.
  • case_sensitive (bool, optional) – Default: False.
Returns:

An instance of streamsets.sdk.sch_models.Classifier.

build(name, category, score)[source]

Build the classification rule.

Parameters:
  • name (str) – Classification Rule name.
  • category (str) – Classification Rule category.
  • score (float) – Classification Rule score.
Returns:

An instance of streamsets.sdk.sch_models.ClassificationRule.

DataCollectors

class streamsets.sdk.sch_models.DataCollector(data_collector, control_hub)[source]

Model for Data Collector.

execution_mode

boolTrue for Edge and False for SDC.

id

str – Data Collectort id.

labels

list – Labels for Data Collector.

last_validated_on

str – Last validated time for Data Collector.

reported_labels

list – Reported labels for Data Collector.

url

str – Data Collector’s url.

version

str – Data Collector’s version.

accessible

Returns a bool for whether the Data Collector instance is accessible.

acl

Get DataCollector ACL.

Returns:An instance of streamsets.sdk.sch_models.ACL.
attributes

Returns a dict of Data Collector attributes.

pipelines_committed

Control Hub Job IDs that are about to be started but have no corresponding pipeline status yet.

Returns:A list of Job IDs (str objects).
responding

Returns a bool for whether the Data Collector instance is responding.

Transformers

class streamsets.sdk.sch_models.Transformer(transformer, control_hub)[source]

Model for Transformer.

execution_mode

str

id

str – Transformer id.

labels

list – Labels for Transformer.

last_validated_on

str – Last validated time for Transformer.

reported_labels

list – Reported labels for Transformer.

url

str – Transformer’s url.

version

str – Transformer’s version.

accessible

Returns a bool for whether the Transformer instance is accessible.

acl

Get Transformer ACL.

Returns:An instance of streamsets.sdk.sch_models.ACL.
attributes

Returns a dict of Transformer attributes.

Group

class streamsets.sdk.sch_models.Group(group, roles)[source]

Model for Group.

Parameters:
  • group (dict) – A Python object representation of Group.
  • roles (dict) – A mapping of role IDs to role labels.
class streamsets.sdk.sch_models.Groups(control_hub, roles, organization)[source]

Collection of streamsets.sdk.sch_models.Group instances.

Parameters:
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.
  • roles (dict) – A mapping of role IDs to role labels.
  • organization (str) – Organization ID.
class streamsets.sdk.sch_models.GroupBuilder(group, roles)[source]

Class with which to build instances of streamsets.sdk.sch_models.Group.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_group_builder().
Parameters:
  • group (dict) – Python object built from our Swagger GroupJson definition.
  • roles (dict) – A mapping of role IDs to role labels.
build(id, display_name, ldap_groups=None)[source]

Build the group.

Parameters:
  • id (str) – Group ID.
  • display_name (str) – Group display name.
  • ldap_groups (list) – List of LDAP groups (strings).
Returns:

An instance of streamsets.sdk.sch_models.Group.

Jobs

class streamsets.sdk.sch_models.Job(job, control_hub=None)[source]

Model for Job.

commit_id

str – Pipeline commit id.

commit_label

str – Pipeline commit label.

created_by

str – User that created this job.

created_on

int – Time at which this job was created.

data_collector_labels

list – Labels of the data collectors.

description

str – Job description.

destroyer

str – Job destroyer.

enable_failover

bool – Flag that indicates if failover is enabled.

enable_time_series_analysis

bool – Flag that indicates if time series is enabled.

execution_mode

bool – True for Edge and False for SDC.

job_deleted

bool – Flag that indicates if this job is deleted.

job_id

str – Id of the job.

job_name

str – Name of the job.

last_modified_by

str – User that last modified this job.

last_modified_on

int – Time at which this job was last modified.

number_of_instances

int – Number of instances.

pipeline_force_stop_timeout

int – Timeout for Pipeline force stop.

pipeline_id

str – Id of the pipeline that is running the job.

pipeline_name

str – Name of the pipeline that is running the job.

pipeline_rule_id

str – Rule Id of the pipeline that is running the job.

read_policy

streamsets.sdk.sch_models.ProtectionPolicy – Read Policy of the job.

runtime_parameters

str – Run-time parameters of the job.

statistics_refresh_interval_in_millisecs

int – Refresh interval for statistics in milliseconds.

status

string – Status of the job.

write_policy

streamsets.sdk.sch_models.ProtectionPolicy – Write Policy of the job.

acl

Get job ACL.

Returns:An instance of streamsets.sdk.sch_models.ACL.
add_tag(*tags)[source]

Add a tag

Parameters:*tags – One or more instances of str
commit

Get pipeline commit of the job.

Returns:An instance of streamsets.sdk.sch_models.PipelineCommit.
committed_offsets

Get the committed offsets for a given job id.

Returns:An instance of streamsets.sdk.sch_models.JobCommittedOffset.
metrics(metric_type='RECORD_COUNT', pipeline_version=None, sdc_id=None, time_filter_condition='LATEST', include_error_count=False)[source]

Get metrics for the job.

Parameters:
  • metric_type (str, optional) – metric_type in {‘RECORD_COUNT’, ‘RECORD_THROUGHPUT’}. Default: 'RECORD_COUNT'.
  • pipeline_version (str, optional) – Default: None. If default, version is picked as the version of the current commit of the pipeline.
  • sdc_id (str, optional) – Default: None. If default, SDC ID is picked as the Id of the SDC with the current job run.
  • time_filter_condition (str, optional) – Default: LATEST.
  • include_error_count (boolean, optional) – Default: False.
Returns:

An instance of streamsets.sdk.sch_models.JobMetrics.

pipeline

Get the pipeline object corresponding to this job.

remove_tag(*tags)[source]

Remove a tag

Parameters:*tags – One or more instances of str
system_job

Get the sytem Job for this job if exists.

Returns:An instance of streamsets.sdk.sch_models.Job.
tag

Get pipeline tag of the job.

Returns:An instance of streamsets.sdk.sch_models.PipelineTag.
tags

Get the job tags.

Returns:A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.Tag.
time_series_metrics(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]

Get historic time series metrics for the job.

Parameters:
  • metric_type (str) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}.
  • time_filter_condition (str, optional) – Default: 'LAST_5M'.
Returns:

An instance of streamsets.sdk.sch_models.JobTimeSeriesMetrics.

class streamsets.sdk.sch_models.Jobs(control_hub)[source]

Collection of streamsets.sdk.sch_models.Job instances.

Parameters:control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.
count(status)[source]

Get job counts by status.

Parameters:status (str) – Status of the jobs in {‘ACTIVE’, ‘INACTIVE’, ‘ACTIVATING’, ‘DEACTIVATING’, ‘INACTIVE_ERROR’, ‘ACTIVE_GREEN’, ‘ACTIVE_RED’, ‘’}
Returns:An instance of int indicating the count of jobs with specified status.
class streamsets.sdk.sch_models.JobBuilder(job, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.Job.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_job_builder().
Parameters:job (dict) – Python object built from our Swagger JobJson definition.
build(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]

Build the job.

Parameters:
  • job_name (str) – Name of the job.
  • pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline object.
  • job_template (boolean, optional) – Indicate if it is a Job Template. Default: False.
  • runtime_parameters (dict, optional) – Runtime Parameters for the Job or Job Template. Default: None.
  • pipeline_commit (streamsets.sdk.sch_models.PipelineCommit) – Default: ``None`, which resolves to the latest pipeline commit.
  • pipeline_tag (streamsets.sdk.sch_models.PipelineTag) – Default: ``None`, which resolves to the latest pipeline tag.
  • pipeline_commit_or_tag (str, optional) – Default: None, which resolves to the latest pipeline commit.
  • tags (list, optional) – Job tags. Default: None.
Returns:

An instance of streamsets.sdk.sch_models.Job.

class streamsets.sdk.sch_models.JobMetrics(metrics)[source]

Model for job metrics.

output_count

dict – Stage ID to output count map.

error_count

dict – Stage ID to error count map.

Parameters:metrics (dict) – Metrics counts in JSON format.
class streamsets.sdk.sch_models.JobOffset(offset)[source]

Model for offset.

Parameters:offset (dict) – Offset in JSON format.
class streamsets.sdk.sch_models.JobRunEvent(event)[source]

Model for an event in a Job Run.

Parameters:event (dict) – Job Run Event in JSON format.
class streamsets.sdk.sch_models.JobStatus(status, control_hub, **kwargs)[source]

Model for Job Status.

run_history

streamsets.sdk.utils.SeekableList – (streamsets.sdk.utils.JobRunHistoryEvent): History of a particular job run.

offsets

streamsets.sdk.utils.SeekableList – (streamsets.sdk.utils.JobPipelineOffset): Offsets after the job run.

Parameters:
  • status (dict) – Job status in JSON format.
  • control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.
class streamsets.sdk.sch_models.JobTimeSeriesMetric(metric, metric_type)[source]

Model for job metrics.

name

str – Name of measurement.

values

list – Timeseries data.

time_series

dict – Timeseries data with timestamp as key and metric value as value.

Parameters:metric (dict) – Metrics in JSON format.
class streamsets.sdk.sch_models.JobTimeSeriesMetrics(metrics, metric_type)[source]

Model for job metrics.

input_records

streamsets.sdk.sch_models.JobTimeSeriesMetric – Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

output_records

streamsets.sdk.sch_models.JobTimeSeriesMetric – Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

error_records

streamsets.sdk.sch_models.JobTimeSeriesMetric – Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.

batch_counter

streamsets.sdk.sch_models.JobTimeSeriesMetric – Appears when queried for ‘Batch Throughput Time Series’.

batch_processing_timer

streamsets.sdk.sch_models.JobTimeSeriesMetric – Appears when queried for ‘Batch Processing Timer seconds’.

Parameters:metrics (dict) – Metrics in JSON format.
class streamsets.sdk.sch_models.RuntimeParameters(runtime_parameters, job)[source]

Wrapper for Control Hub job runtime parameters.

Parameters:

Organizations

class streamsets.sdk.sch_models.Organization(organization, organization_admin_user=None, api_client=None)[source]

Model for Organization.

Parameters:
  • organization (str) – Organization Id.
  • organization_admin_user (str, optional) – Default: None.
  • api_client (streamsets.sdk.sch_api.ApiClient, optional) – Default: None.
class streamsets.sdk.sch_models.Organizations(control_hub)[source]

Collection of streamsets.sdk.sch_models.Organization instances.

class streamsets.sdk.sch_models.OrganizationBuilder(organization, organization_admin_user)[source]

Class with which to build instances of streamsets.sdk.sch_models.Organization.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_organization_builder().
Parameters:organization (dict) – Python object built from our Swagger UserJson definition.
build(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]

Build the organization.

Parameters:
  • id (str) – Organization ID.
  • name (str) – Organization name.
  • admin_user_id (str) – User Id of the admin of this organization.
  • admin_user_display_name (str) – User display name of admin of this organization.
  • admin_user_email_address (str) – User email address of admin of this organization.
  • admin_user_ldap_user_name (str, optional) – LDAP username. Default: None.
Returns:

An instance of streamsets.sdk.sch_models.Organization.

Pipelines

class streamsets.sdk.sch_models.Pipeline(pipeline, builder, pipeline_definition, rules_definition, control_hub=None, library_definitions=None)[source]

Model for Pipeline.

Parameters:
  • pipeline (dict) – Pipeline in JSON format.
  • builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.
  • pipeline_definition (dict) – Pipeline Definition in JSON format.
  • rules_definition (dict) – Rules Definition in JSON format.
  • control_hub (streamsets.sdk.sch.ControlHub, optional) – ControlHub object. Default: None.
  • library_definitions (dict, optional) – Library Definition in JSON format. Default: None.
Labels

Get the pipeline labels. This attribute will be deprecated in a future release. Please use labels instead.

Returns:A list of dict
acl

Get pipeline ACL.

Returns:An instance of streamsets.sdk.sch_models.ACL.
add_label(*labels)[source]

Add a label

Parameters:*labels – One or more instances of str
commits

Get commits for this pipeline.

Returns:A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineCommit.
configuration

Get pipeline’s configuration.

Returns:An instance of streamsets.sdk.sch_models.Configuration.
labels

Get the pipeline labels.

Returns:A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineLabel.
parameters

Get the pipeline parameters.

Returns:A dict like, streamsets.sdk.sch_models.PipelineParameters object of parameter key-value pairs.
remove_label(*labels)[source]

Remove a label

Parameters:*labels – One or more instances of str
tags

Get tags for this pipeline.

Returns:A streamsets.sdk.utils.SeekableList of instances of streamsets.sdk.sch_models.PipelineTag.
class streamsets.sdk.sch_models.Pipelines(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Pipeline instances.

Parameters:
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.
  • organization (str) – Organization Id.
class streamsets.sdk.sch_models.PipelineBuilder(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False)[source]

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder().
Parameters:
  • pipeline (dict) – Python object built from our Swagger PipelineJson definition.
  • data_collector_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Data Collector Pipeline Builder object.
  • control_hub (streamsets.sdk.sch.ControlHub) – Default: None.
  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.
add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – Stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – Stage name to use when selecting stage from definitions. Default: None.
  • type (str, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
  • library (str, optional) – Stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sch_models.SchSdcStage.

build(title='Pipeline', labels=None, **kwargs)[source]

Build the pipeline.

Parameters:
  • title (str) – title of the pipeline.
  • labels (list, optional) – List of pipeline labels of type str. Default: None.
Returns:

py:class`streamsets.sdk.sch_models.Pipeline`.

Return type:

An instance of

class streamsets.sdk.sch_models.PipelineParameters(pipeline)[source]

Parameters for pipelines.

Parameters:pipeline (streamsets.sdk.sch_models.Pipeline) – Pipeline Instance.
update(parameters_dict)[source]

Update existing parameters. Works similar to Python dictionary update.

Parameters:parameters_dict (dict) – Dictionary of key-value pairs to be used as parameters.
class streamsets.sdk.sch_models.StPipelineBuilder(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False)[source]

Class with which to build instances of streamsets.sdk.sch_models.Pipeline.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder().
Parameters:
  • pipeline (dict) – Python object built from our Swagger PipelineJson definition.
  • transformer_pipeline_builder (streamsets.sdk.sdc_models.PipelineBuilder) – Transformer Pipeline Builder object.
  • control_hub (streamsets.sdk.sch.ControlHub) – Default: None.
  • fragment (boolean, optional) – Specify if a fragment builder. Default: False.
add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – Transformer stage label to use when selecting stage from definitions. Default: None.
  • name (str, optional) – Transformer stage name to use when selecting stage from definitions. Default: None.
  • type (str, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None.
  • library (str, optional) – Transformer stage library to use when selecting stage from definitions. Default: None.
Returns:

An instance of streamsets.sdk.sch_models.SchStStage.

build(title='Pipeline', **kwargs)[source]

Build the pipeline.

Parameters:title (str) – title of the pipeline.
Returns:py:class`streamsets.sdk.sch_models.Pipeline`.
Return type:An instance of

Protection Methods

class streamsets.sdk.sch_models.ProtectionMethod(stage)[source]

Protection Method Model.

Parameters:stage (dict) – JSON representation of a stage.
class streamsets.sdk.sch_models.ProtectionMethodBuilder(pipeline_builder)[source]

Class with which to build instances of streamsets.sdk.sch_models.ProtectionMethod.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_method_builder().
Parameters:pipeline_builder (streamsets.sdk.sch_models.PipelineBuilder) – Pipeline Builder object.

Protection Policies

class streamsets.sdk.sch_models.ProtectionPolicy(protection_policy, procedures=None)[source]

Model for Protection Policy.

Parameters:
class streamsets.sdk.sch_models.ProtectionPolicies(control_hub)[source]

Collection of streamsets.sdk.sch_models.ProtectionPolicy instances.

class streamsets.sdk.sch_models.ProtectionPolicyBuilder(control_hub, protection_policy, policy_procedure)[source]

Class with which to build instances of streamsets.sdk.sch_models.ProtectionPolicy.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_policy_builder().
Parameters:
  • protection_policy (dict) – Python object defining a protection policy.
  • policy_procedure (dict) – Python object defining a policy procedure.
class streamsets.sdk.sch_models.PolicyProcedure(policy_procedure)[source]

Model for Policy Procedure.

Parameters:policy_procedure (dict) – JSON representation of Policy Procedure.

ProvisioningAgents

class streamsets.sdk.sch_models.Deployment(deployment, control_hub=None)[source]

Model for Deployment.

Parameters:
  • deployment (dict) – A Python object representation of Deployment.
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.
class streamsets.sdk.sch_models.Deployments(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.Deployment instances.

Parameters:
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.
  • organization (str) – Organization Id.
class streamsets.sdk.sch_models.DeploymentBuilder(deployment)[source]

Class with which to build instances of streamsets.sdk.sch_models.Deployment.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_deployment_builder().
Parameters:deployment (dict) – Python object that represents Deployment JSON.
build(name, provisioning_agent, number_of_data_collector_instances, spec=None, description=None, data_collector_labels=None)[source]

Build the deployment.

Parameters:
  • name (str) – Deployment Name.
  • provisioning_agent (streamsets.sdk.sch_models.ProvisioningAgent) – Agent to use.
  • (obj (number_of_data_collector_instances) – int): Number of sdc instances.
  • spec (dict, optional) – Deployment yaml in dictionary format. Will use default yaml used by ui if left out.
  • description (str, optional) – Default: None.
  • data_collector_labels (list, optional) – Default: ['all'].
Returns:

An instance of streamsets.sdk.sch_models.Deployment.

class streamsets.sdk.sch_models.ProvisioningAgent(provisioning_agent, control_hub)[source]

Model for Provisioning Agent.

Parameters:
  • provisioning_agent (dict) – A Python object representation of Provisioning Agent.
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.
class streamsets.sdk.sch_models.ProvisioningAgents(control_hub, organization)[source]

Collection of streamsets.sdk.sch_models.ProvisioningAgent instances.

Parameters:
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.
  • organization (str) – Organization Id.

Reports

class streamsets.sdk.sch_models.GenerateReportCommand(control_hub, report_defintion, response)[source]

Command to interact with the response from generate_report.

Parameters:
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.
  • report_definition (dict) – JSON representation of Report Definition.
  • response (dict) – Api response from generating the report.
class streamsets.sdk.sch_models.Report(report, control_hub, report_definition_id)[source]

Model for Report.

Parameters:report (dict) – JSON representation of Report.
download()[source]

Download the Report in PDF format

Returns:An instance of bytes.
class streamsets.sdk.sch_models.Reports(control_hub, report_definition_id)[source]

Collection of streamsets.sdk.sch_models.Report instances.

Parameters:
  • control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.
  • report_definition_id (str) – Report Definition Id.
class streamsets.sdk.sch_models.ReportDefinition(report_definition, control_hub)[source]

Model for Report Definition.

Parameters:
  • report_definition (dict) – JSON representation of Report Definition.
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.
acl

Get Report Definition ACL.

Returns:An instance of streamsets.sdk.sch_models.ACL.
generate_report()[source]

Generate a Report for Report Definition.

Returns:An instance of streamsets.sdk.sch_models.Report.
report_resources

Get Report Resources of the Report Definition.

Returns:An instance of streamsets.sdk.sch_models.ReportResources.
reports

Get Reports of the Report Definition.

Returns:An instance of streamsets.sdk.sch_models.Reports.
class streamsets.sdk.sch_models.ReportDefinitions(control_hub)[source]

Collection of streamsets.sdk.sch_models.ReportDefinition instances.

class streamsets.sdk.sch_models.ReportDefinitionBuilder(report_definition, control_hub)[source]

Class with which to build instances of streamsets.sdk.sch_models.ReportDefinition.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_report_definition_builder().
Parameters:
  • report_definition (dict) – JSON representation of Report Definition.
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.
add_report_resource(resource)[source]

Add a given resource to Report Definition resources.

Parameters:resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –
build(name, description=None)[source]

Build the report definition.

Parameters:
  • name (str) – Name of the Report Definition.
  • description (str, optional) – Description of the Report Definition. Default: None.
Returns:

An instance of streamsets.sdk.sch_models.ReportDefinition.

import_report_definition(report_definition)[source]

Import an existing Report Definition to update it.

Parameters:report_definition (streamsets.sdk.sch_models.ReportDefinition) – Report Definition object.
remove_report_resource(resource)[source]

Remove a resource from Report Definition Resources.

Parameters:resource (streamsets.sdk.sch_models.Job) or (streamsets.sdk.sch_models.Topology) –
Returns:A resource of type dict that is removed from Report Definition Resources.
set_data_retrieval_period(start_time, end_time)[source]

Set Time range over which the report will be generated.

Parameters:
  • start_time (str) or (int) – Absolute or relative start time for the Report.
  • end_time (str) or (int) – Absolute or relative end time for the Report.
class streamsets.sdk.sch_models.ReportResource(report_resource)[source]

Model for Report Resource.

Parameters:report_resource (dict) – JSON representation of Report Resource.
class streamsets.sdk.sch_models.ReportResources(report_resources, report_definition)[source]

Model for the collection of Report Resources.

Parameters:

Scheduler

class streamsets.sdk.sch_models.ScheduledTask(task, control_hub=None)[source]

Model for Scheduled Task.

Parameters:
audits

Get Scheduled Task Audits.

Returns:A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskAudit.
runs

Get Scheduled Task Runs.

Returns:A streamsets.sdk.utils.SeekableList of inherited instances of streamsets.sdk.sch_models.ScheduledTaskRun.
class streamsets.sdk.sch_models.ScheduledTaskAudit(audit)[source]

Scheduled Task Audit.

Parameters:run (dict) – JSON representation of scheduled task audit.
class streamsets.sdk.sch_models.ScheduledTaskBaseModel(data, attributes_to_ignore=None, attributes_to_remap=None, repr_metadata=None)[source]

Base Model for Scheduled Task related classes.

class streamsets.sdk.sch_models.ScheduledTaskBuilder(job_selection_types, control_hub)[source]

Builder for Scheduled Task.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_scheduled_task_builder().
Parameters:
  • job_selection_types (dict) – JSON representation of job selection types.
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.
build(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]

Builder for Scheduled Task.

Parameters:
  • task_object (streamsets.sdk.sch_models.Job) – (streamsets.sdk.sch_models.ReportDefinition): Job or ReportDefinition object.
  • action (str, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default: START.
  • name (str, optional) – Name of the task. Default: None.
  • description (str, optional) – Description of the task. Default: None.
  • crontab_mask (str, optional) – Schedule in cron syntax. Default: "0 0 1/1 * ? *". (Daily at 12).
  • time_zone (str, optional) – Time zone. Default: "UTC".
  • status (str, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default: RUNNING.
  • start_time (str, optional) – Start time of task. Default: None.
  • end_time (str, optional) – End time of task. Default: None.
  • missed_trigger_handling (str, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default: IGNORE.
Returns:

An instance of streamsets.sdk.sch_models.ScheduledTask.

class streamsets.sdk.sch_models.ScheduledTaskRun(run)[source]

Scheduled Task Run.

Parameters:run (dict) – JSON representation if scheduled task run.
class streamsets.sdk.sch_models.ScheduledTasks(control_hub)[source]

Collection of streamsets.sdk.sch_models.ScheduledTask instances.

Subscriptions

class streamsets.sdk.sch_models.Subscription(subscription, control_hub)[source]

Subscription.

Parameters:
  • subscription (dict) – JSON representation of Subscription.
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.
action

Action of the Subscription.

events

Events of the Subscription.

class streamsets.sdk.sch_models.Subscriptions(control_hub)[source]

Collection of streamsets.sdk.sch_models.Subscription instances.

Parameters:control_hub (streamsets.sdk.sch.ControlHub) – Control Hub object.
class streamsets.sdk.sch_models.SubscriptionAction(action)[source]

Action to take when the Subscription is triggered.

Parameters:action (dict) – JSON representation of an Action for a Subscription.
class streamsets.sdk.sch_models.SubscriptionBuilder(subscription, control_hub)[source]

Builder for Subscription.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_subscription_builder().
Parameters:
  • subscription (dict) – JSON representation of event subscription.
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.
add_event(event_type, filter=None)[source]

Add event to the Subscription.

Parameters:
  • event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.
  • filter (str, optional) – Filter to be applied on event. Default: None.
build(name, description=None)[source]

Builder for Scheduled Task.

Parameters:
  • name (str) – Name of Subscription.
  • description (str, optional) – Description of subscription. Default: None.
Returns:

An instance of streamsets.sdk.sch_models.Subscription.

import_subscription(subscription)[source]

Import an existing Subscription into the builder to update it.

Parameters:subscription (streamsets.sdk.sch_models.Subscription) – Subscription instance.
remove_event(event_type)[source]

Remove event from the subscription.

Parameters:event_type (str) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.
Returns:An instance of streamsets.sdk.sch_models.SubscriptionEvent.
set_email_action(recipients, subject=None, body=None)[source]

Set the Email action.

Parameters:
  • recipients (list) – List of email addresses.
  • subject (str, optional) – Subject of the email. Default: None.
  • body (str, optional) – Body of the email. Default: None.
set_webhook_action(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None)[source]

Set the Webhook action.

Parameters:
  • uri (str) – URI for the Webhook.
  • method (str, optional) – HTTP method to use. Default: 'GET'.
  • content_type (str, optional) – Content Type of the request. Default: None.
  • payload (str, optional) – Payload of the request. Default: None.
  • auth_type (str, optional) – 'basic' or None. Default: None.
  • username (str, optional) – username for the authentication. Default: None.
  • password (str, optional) – password for the authentication. Default: None.
  • timeout (int, optional) – timeout for the Webhook action. Default: 30000.
  • headers (dict, optional) – Headers to be sent to the Webhook call. Default: None.
class streamsets.sdk.sch_models.SubscriptionEvent(event, control_hub)[source]

An Event of a Subscription.

Parameters:
  • event (dict) – JSON representation of Events of a Subscription.
  • control_hub (streamsets.sdk.ControlHub) – Control Hub instance.

Topologies

class streamsets.sdk.sch_models.Topology(topology, control_hub=None)[source]

Model for Topology.

Parameters:topology (dict) – JSON representation of Topology.
commit_id

str – Pipeline commit id.

commit_message

str – Commit Message.

commit_time

int – Time at which commit was made.

committed_by

str – User that made the commit.

default_topology

bool – Default Topology.

description

str – Topology description.

draft

bool – Indicates whether this topology is a draft.

last_modified_by

str – User that last modified this topology.

last_modified_on

int – Time at which this topology was last modified.

organization

str – Id of the organization.

parent_version

str – Version of the parent topology.

topology_definition

str – Definition of the topology.

topology_id

str – Id of the topology.

topology_name

str – Name of the topology.

version

str – Version of this topology.

class streamsets.sdk.sch_models.Topologies(control_hub)[source]

Collection of streamsets.sdk.sch_models.Topology instances.

Parameters:control_hub (streamsets.sdk.sch.ControlHub) – An instance of the Control Hub.
class streamsets.sdk.sch_models.TopologyBuilder(topology)[source]

Class with which to build instances of streamsets.sdk.sch_models.Topology.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_topology_builder().
Parameters:topology (dict) – Python object built from our Swagger TopologyJson definition.
build(topology_name, description=None)[source]

Build the topology.

Parameters:
  • topology_name (str) – Name of the topology.
  • description (str) – Description of the topology.
Returns:

An instance of streamsets.sdk.sch_models.Topology.

Users

class streamsets.sdk.sch_models.User(user, roles)[source]

Model for User.

Parameters:
  • user (dict) – JSON representation of User.
  • roles (dict) – A mapping of role IDs to role labels.
active

bool – Whether the user is active or not.

created_by

str – Creator of this user.

created_on

str – Creation time of this user.

display_name

str – Display name of this user.

email_address

str – Email address of this user.

id

str – Id of this user.

groups

list – Groups this user belongs to.

last_modified_by

str – User last modified by.

last_modified_on

str – User last modification time.

password_expires_on

str – User’s password expiration time.

password_system_generated

bool – Whether User’s password is system generated or not.

roles

set – A set of role labels.

saml_user_name

str – SAML username of user.

class streamsets.sdk.sch_models.Users(control_hub, roles, organization)[source]

Collection of streamsets.sdk.sch_models.User instances.

Parameters:
  • control_hub – An instance of streamsets.sdk.sch.ControlHub.
  • roles (dict) – A mapping of role IDs to role labels.
  • organization (str) – Organization ID.
class streamsets.sdk.sch_models.UserBuilder(user, roles, control_hub=None)[source]

Class with which to build instances of streamsets.sdk.sch_models.User.

Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_user_builder().
Parameters:
  • user (dict) – Python object built from our Swagger UserJson definition.
  • roles (dict) – A mapping of role IDs to role labels.
  • control_hub – An instance of streamsets.sdk.sch.ControlHub. Default: None.
build(id, display_name, email_address, saml_user_name=None, ldap_user_name=None)[source]

Build the user.

Parameters:
  • id (str) – User Id.
  • display_name (str) – User display name.
  • email_address (str) – User Email Address.
  • saml_user_name (str, optional) – Default: None.
  • ldap_user_name (str, optional) – Default: None.
Returns:

An instance of streamsets.sdk.sch_models.User.

Common

Models used by StreamSets Data Collector and StreamSets Control Hub:

class streamsets.sdk.models.Configuration(configuration=None, property_key='name', property_value='value', **kwargs)[source]

Abstraction for stage configurations.

This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form

[ {
  "name" : "executionMode",
  "value" : "STANDALONE"
}, {
  "name" : "deliveryGuarantee",
  "value" : "AT_LEAST_ONCE"
}, ... ]

By implementing simple __getitem__ and __setitem__ methods, this class allows items in this list to be accessed using

configuration['executionMode'] = 'CLUSTER_BATCH'

Instead of the more verbose

for property in configuration:
    if property['name'] == 'executionMode':
        property['value'] = 'CLUSTER_BATCH'
    break
Parameters:
  • configuration (str) – List of dictionaries comprising the configuration.
  • property_key (str, optional) – The dictionary entry denoting the property key. Default: name
  • property_value (str, optional) – The dictionary entry denoting the property value. Default: value
get(key, default=None)[source]

Return the value of key or, if not in the configuration, the default value.

items()[source]

Gets the configuration’s items.

Returns:A new view of the configuration’s items ((key, value) pairs).
update(configs)[source]

Update instance with a collection of configurations.

Parameters:configs (dict) – Dictionary of configurations to use.

Exceptions

Common exceptions.

exception streamsets.sdk.exceptions.ActivationError(reason=None)[source]

Activation error.

exception streamsets.sdk.exceptions.BadRequestError(response)[source]

Bad request error (HTTP 400).

exception streamsets.sdk.exceptions.DeploymentInactiveError(message)[source]

Deployment status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.InternalServerError(response)[source]

Internal server error.

exception streamsets.sdk.exceptions.InvalidCredentialsError(message)[source]

Invalid credentials error.

exception streamsets.sdk.exceptions.JobInactiveError(message)[source]

Job status changed into INACTIVE_ERROR.

exception streamsets.sdk.exceptions.JobRunnerError(code, message)[source]

JobRunner errors.

exception streamsets.sdk.exceptions.TopologyIssuesError(issues)[source]

Topology has some issues.

exception streamsets.sdk.exceptions.UnsupportedMethodError(message)[source]

An unsupported method was called.

exception streamsets.sdk.exceptions.ValidationError(issues)[source]

Validation issues.