API Reference

The StreamSets SDK for Python is broadly divided into abstractions for interacting with StreamSet Data Collector and StreamSets Control Hub.

StreamSets Data Collector

Main interface

This is the main entry point used by users when interacting with SDC instances.

class streamsets.sdk.DataCollector(server_url, username=None, password=None, control_hub=None, dump_log_on_error=False, **kwargs)[source]

Class to interact with StreamSets Data Collector.

If connecting to an StreamSets Control Hub-registered instance of Data Collector, create an instance
of streamsets.sdk.ControlHub instead of instantiating with a username and password.
Parameters:
  • server_url (str) – URL of an existing SDC deployment with which to interact.
  • username (str, optional) – SDC username. Default: streamsets.sdk.sdc.DEFAULT_SDC_USERNAME
  • password (str, optional) – SDC password. Default: streamsets.sdk.sdc.DEFAULT_SDC_PASSWORD
  • control_hub (streamsets.sdk.ControlHub, optional) – A StreamSets Control Hub instance to use for SCH-registered Data Collectors. Default: None
  • dump_log_on_error (bool) – Whether to output Data Collector logs when exceptions are raised by certain methods. Default: False
add_pipeline(*pipelines)[source]

Add one or more pipelines to the DataCollector instance.

Parameters:*pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline
capture_snapshot(pipeline, snapshot_name=None, start_pipeline=False, runtime_parameters=None, batches=1, batch_size=10, **kwargs)[source]

Capture a snapshot for given pipeline.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • snapshot_name (str, optional) – Name for the generated snapshot. If set to None, an auto-generated UUID (which can be recovered from the returned SnapshotCommand object’s snapshot_name attribute) will be used when calling the REST API. Default: None
  • start_pipeline (bool, optional) – If set to true, then the pipeline will be started and its first batch will be captured. Otherwise, the pipeline must be running, in which case one of the next batches will be captured. Default: False
  • runtime_parameters (dict, optional) – Runtime parameters to override Pipeline Parameters value. Default: None
  • wait (bool, optional) – Wait for capture snapshot to finish. Default: True
  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED']
Returns:

An instance of streamsets.sdk.sdc_api.SnapshotCommand

current_user

Get currently logged-in user and its groups and roles.

Returns:An instance of streamsets.sdk.sdc_models.User
definitions

Get an SDC instance’s definitions.

Will return a cached instance of the definitions if called more than once.

get_alerts()[source]

Get pipeline alerts.

Returns:An instance of streamsets.sdk.sdc_models.Alerts
get_bundle(generators=None)[source]

Generate new support bundle.

Returns:An instance of zipfile.ZipFile
get_bundle_generators()[source]

Get available support bundle generators.

Returns:An instance of streamsets.sdk.sdc_models.BundleGenerators
get_logs(ending_offset=-1, extra_message=None, pipeline=None, severity=None)[source]

Get logs.

Parameters:
  • ending_offset (int) – ending_offset.
  • extra_message (str) – extra_message.
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • severity (str) – severity.
Returns:

An instance of streamsets.sdk.sdc_models.Log

get_pipeline_acl(pipeline)[source]

Get pipeline ACL.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.sdc_models.PipelineAcl
get_pipeline_builder()[source]

Get a pipeline builder instance with which a pipeline can be created.

Returns:An instance of streamsets.sdk.sdc_models.PipelineBuilder
get_pipeline_history(pipeline)[source]

Get a pipeline’s history.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.sdc_models.History
get_pipeline_permissions(pipeline)[source]

Return pipeline permissions for a given pipeline.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
Returns:An instance of streamsets.sdk.sdc_models.PipelinePermissions
get_pipeline_status(pipeline)[source]

Get status of a pipeline.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline) – The Pipeline instance.
get_snapshots(pipeline=None)[source]

Get information about stored snapshots.

Parameters:pipeline (streamsets.sdk.sdc_models.Pipeline, optional) – The pipeline instance. Default: None
Returns:A list of streamsets.sdk.sdc_models.SnapshotInfo instances
remove_pipeline(*pipelines)[source]

Remove one or more pipelines from the DataCollector instance.

Parameters:*pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline
run_pipeline_preview(pipeline, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=2000, stage_outputs_to_override_json=None, **kwargs)[source]

Run pipeline preview.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • rev (int, optional) – Pipeline revision. Default: 0
  • batches (int, optional) – Number of batches. Default: 1
  • batch_size (int, optional) – Batch size. Default: 10
  • skip_targets (bool, optional) – Skip targets. Default: True
  • end_stage (str, optional) – End stage. Default: None
  • timeout (int, optional) – Timeout. Default: 2000
  • stage_outputs_to_override_json (str, optional) – Stage outputs to override. Default: None
  • wait (bool, optional) – Wait for pipeline preview to finish. Default: True
Returns:

An instance of streamsets.sdk.sdc_api.PreviewCommand

set_pipeline_acl(pipeline, pipeline_acl)[source]

Update pipeline ACL.

Parameters:
Returns:

An instance of streamsets.sdk.sdc_api.Command

set_user(username, password=None)[source]

Set the user with which to interact with SDC.

Parameters:
  • username (str) – Username of user.
  • password (str, optional) – Password for user. Default: same as username
start_pipeline(pipeline, runtime_parameters=None, **kwargs)[source]

Start a pipeline.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • runtime_parameters (dict, optional) – Collection of runtime parameters. Default: None
  • wait (bool, optional) – Wait for pipeline to start. Default: True
  • wait_for_statuses (list, optional) – Pipeline statuses to wait on. Default: ['RUNNING', 'FINISHED']
Returns:

An instance of streamsets.sdk.sdc_api.PipelineCommand

stop_pipeline(pipeline, **kwargs)[source]

Stop a pipeline.

Parameters:
  • pipeline (streamsets.sdk.sdc_models.Pipeline) – The pipeline instance.
  • force (bool, optional) – Force pipeline to stop. Default: False
  • wait (bool, optional) – Wait for pipeline to stop. Default: True
Returns:

An instance of streamsets.sdk.sdc_api.StopPipelineCommand

sdc.DEFAULT_SDC_USERNAME = 'admin'
sdc.DEFAULT_SDC_PASSWORD = 'admin'

Models

These models wrap and provide useful functionality for interacting with common SDC abstractions.

Alerts

class streamsets.sdk.sdc_models.Alert(alert)[source]

Pipeline alert.

Parameters:alert – Python object representation of a pipeline alert.
alert_texts

Get alert’s alert texts.

Returns:The alert’s alert texts as a str
label

Get alert’s label.

Returns:The alert’s label as a str
pipeline_id

Get alert’s pipeline ID.

Returns:The pipeline ID as a str
class streamsets.sdk.sdc_models.Alerts(alerts)[source]

Container for list of alerts with filtering capabilities.

Parameters:alerts – Python object representation of alerts.
alerts

list – A list of streamsets.sdk.sdc_models.Alert instances

for_pipeline(pipeline)[source]

Get alerts for the specified pipeline.

Parameters:pipeline (str) – The pipeline for which to get alerts
Returns:An instance of streamsets.sdk.sdc_models.Alerts

Data Rules

class streamsets.sdk.sdc_models.DataDriftRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text='${alert:info()}', send_email=False, active=False)[source]

Pipeline data drift rule.

Parameters:
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.
  • label (str) – Rule label.
  • condition (str, optional) – Data rule condition. Default: None
  • sampling_percentage (int, optional) – Default: 5
  • sampling_records_to_retain (int, optional) – Default: 10
  • enable_meter (bool, optional) – Default: True
  • enable_alert (bool, optional) – Default: True
  • alert_text (str, optional) – Default: '${alert:info()}'
  • send_email (bool, optional) – Default: False,
  • active (bool, optional) – Enable the data rule. Default: False
active

The rule is active.

Returns:A bool
class streamsets.sdk.sdc_models.DataRule(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text=None, threshold_type='count', threshold_value=100, min_volume=1000, send_email=False, active=False)[source]

Pipeline data rule.

Parameters:
  • stream (str) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here.
  • label (str) – Rule label.
  • condition (str, optional) – Data rule condition. Default: None
  • sampling_percentage (int, optional) – Default: 5
  • sampling_records_to_retain (int, optional) – Default: 10
  • enable_meter (bool, optional) – Default: True
  • enable_alert (bool, optional) – Default: True
  • alert_text (str, optional) – Default: None
  • threshold_type (str, optional) – One of count or percentage. Default: 'count'
  • threshold_value (int, optional) – Default: 100
  • min_volume (int, optional) – Only set if threshold_type is percentage. Default: 1000
  • send_email (bool, optional) – Default: False,
  • active (bool, optional) – Enable the data rule. Default: False
active

The rule is active.

Returns:A bool

History

class streamsets.sdk.sdc_models.History(history)[source]

Pipeline history.

Parameters:history – Python object representation of the pipeline history.
entries

list – A list of streamsets.sdk.sdc_models.HistoryEntry instances.

latest

Get pipeline history’s latest entry.

Returns:The most recent pipeline history entry as an instance of streamsets.sdk.sdc_models.HistoryEntry
class streamsets.sdk.sdc_models.HistoryEntry(entry)[source]

Pipeline history entry.

Parameters:entry – Python object representation of the history entry.
metrics

Get pipeline history entry’s metrics.

Returns:The pipeline history entry’s metrics as an instance of streamsets.sdk.sdc_models.Metrics

Issues

class streamsets.sdk.sdc_models.Issue(issue)[source]

Issue encountered for a pipeline or a stage.

Parameters:issue – Python object representation of the issue
class streamsets.sdk.sdc_models.Issues(issues)[source]

Issues encountered for pipelines as well as stages.

Parameters:issues – Python object representation of the issues
issues_count

The number of issues.

pipeline_issues

list – A list of streamsets.sdk.sdc_models.Issue instances.

stage_issues

dict – A dictionary mapping stage names to instances of streamsets.sdk.sdc_models.Issue.

Logs

class streamsets.sdk.sdc_models.Log(log)[source]

Model for SDC logs.

Parameters:log (list) – JSON representation of the log. A list of dictionaries.
after_time(timestamp)[source]

Returns log happened after the time specified.

:param timestamp (str: ): Timestamp in the form ‘2017-04-10 17:53:55,244’.

Returns:The formatted log as a str
before_time(timestamp)[source]

Returns log happened before the time specified.

:param timestamp (str: ): Timestamp in the form ‘2017-04-10 17:53:55,244’.

Returns:The formatted log as a str

Metrics

class streamsets.sdk.sdc_models.MetricCounter(counter)[source]

Metric counter.

Parameters:counter – Python object representation of a metric counter.
count

Get the metric counter’s count.

Returns:The metric counter’s count
class streamsets.sdk.sdc_models.MetricGauge(gauge)[source]

Metric gauge.

Parameters:gauge – Python object representation of a metric gauge.
value

Get the metric gauge’s value.

Returns:The metric gauge’s value as a str
class streamsets.sdk.sdc_models.MetricHistogram(histogram)[source]

Metric histogram.

Parameters:histogram – Python object representation of a metric histogram.
class streamsets.sdk.sdc_models.MetricTimer(timer)[source]

Metric timer.

Parameters:timer – Python object representation of a metric timer.
class streamsets.sdk.sdc_models.Metrics(metrics)[source]

Metrics.

Parameters:metrics – Python object representation of metrics.
counter(name)[source]

Get the metric counter from metrics.

Parameters:name (str) – Counter name.
Returns:The metric counter as an instance of streamsets.sdk.sdc_models.MetricCounter
gauge(name)[source]

Get the metric gauge from metrics.

Parameters:name (str) – Gauge name.
Returns:The metric gauge as an instance of streamsets.sdk.sdc_models.MetricGauge
histogram(name)[source]

Get the metric histogram from metrics.

Parameters:name (str) – Histogram name.
Returns:The metric histogram as an instance of streamsets.sdk.sdc_models.MetricHistogram
timer(name)[source]

Get the metric timer from metrics.

Parameters:name (str) – Timer namer.
Returns:The metric timer as an instance of streamsets.sdk.sdc_models.MetricTimer

Pipelines

class streamsets.sdk.sdc_models.PipelineBuilder(pipeline, definitions)[source]

Class with which to build SDC pipelines.

This class allows a user to programmatically generate an SDC pipeline. Instead of instantiating this class directly, most users should use streamsets.sdk.DataCollector.get_pipeline_builder().

Parameters:
  • pipeline – Python object representing an empty pipeline. If created manually, this would come from creating a new pipeline in SDC and then exporting it before doing any configuration.
  • definitions (dict) – The output of SDC’s definitions endpoint.
add_data_drift_rule(*data_drift_rules)[source]

Add one or more data drift rules to the pipeline.

Parameters:*data_drift_rules – One or more instances of streamsets.sdk.sdc_models.DataDriftRule
add_data_rule(*data_rules)[source]

Add one or more data rules to the pipeline.

Parameters:*data_rules – One or more instances of streamsets.sdk.sdc_models.DataRule
add_error_stage(label=None, name=None, library=None)[source]

Add an error stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None
Returns:

An instance of streamsets.sdk.sdc_models.Stage

add_stage(label=None, name=None, type=None, library=None)[source]

Add a stage to the pipeline.

When specifying a stage, either label or name must be used. type and library may also be used to select a particular stage if ambiguities exist. If type and/or library are omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None
  • type (str, optional) – SDC stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default: None
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None
Returns:

An instance of streamsets.sdk.sdc_models.Stage

add_start_event_stage(label=None, name=None, library=None)[source]

Add start event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None
Returns:

An instance of streamsets.sdk.sdc_models.Stage

add_stop_event_stage(label=None, name=None, library=None)[source]

Add stop event stage to the pipeline.

When specifying a stage, either label or name must be used. If library is omitted, the first stage definition matching the given label or name will be used.

Parameters:
  • label (str, optional) – SDC stage label to use when selecting stage from definitions. Default: None
  • name (str, optional) – SDC stage name to use when selecting stage from definitions. Default: None
  • library (str, optional) – SDC stage library to use when selecting stage from definitions. Default: None
Returns:

An instance of streamsets.sdk.sdc_models.Stage

build(title=None)[source]

Build the pipeline.

Parameters:title (str, optional) – Pipeline title to use. Default: None
Returns:An instance of streamsets.sdk.sdc_models.Pipeline
class streamsets.sdk.sdc_models.Pipeline(pipeline, all_stages=None)[source]

SDC pipeline.

This class provides abstractions to make it easier to interact with a pipeline before it’s imported into SDC.

Parameters:
  • pipeline – A Python object representing the serialized pipeline.
  • all_stages (dict, optional) – A dictionary mapping stage names to streamsets.sdk.sdc_models.Stage instances. Default: None
add_parameters(**parameters)[source]

Add pipeline parameters.

Parameters:**parameters – Keyword arguments to add.
configuration

Get pipeline’s configuration.

Returns:An instance of streamsets.sdk.models.Configuration
delivery_guarantee

Get the delivery guarantee.

Returns:The delivery guarantee as a str
id

Get the pipeline id.

Returns:The pipeline id as a str
metadata

Get the pipeline metadata.

Returns:Pipeline metadata as a Python object.
origin_stage

Get the pipeline’s origin stage.

Returns:An instance of streamsets.sdk.sdc_models.Stage
parameters

Get the pipeline parameters.

Returns:A dict of parameter key-value pairs
pprint()[source]

Pretty-print the pipeline’s JSON representation.

rate_limit

Get the rate limit (records/sec).

Returns:The rate limit as a str
title

Get the pipeline title.

Returns:The pipeline title as a str
class streamsets.sdk.sdc_models.Stage(stage, label=None)[source]

Pipeline stage.

Parameters:
  • stage – JSON representation of the pipeline stage.
  • label (str, optional) – Human-readable stage label. Default: None
configuration

streamsets.sdk.models.Configuration – The stage configuration.

services

dict – If supported by the stage, a dictionary mapping a service name to an instance of streamsets.sdk.models.Configuration.

add_output(*other_stages, event_lane=False)[source]

Connect output of this stage to another stage.

The __rshift__ operator (>>) has been overloaded to invoke this method.

Parameters:other_stage (streamsets.sdk.sdc_models.Stage) –
Returns:This stage as an instance of streamsets.sdk.sdc_models.Stage)
event_lanes

Get the stage’s list of event lanes.

Returns:A list of event lanes
library

Get the stage’s library.

Returns:The stage library as a str
output_lanes

Get the stage’s list of output lanes.

Returns:A list of output lanes
set_attributes(**attributes)[source]

Set one or more stage attributes.

Parameters:**attributes – Attributes to set.
Returns:This stage as an instance of streamsets.sdk.sdc_models.Stage)
stage_on_record_error

Get the stage’s on record error configuration value.

stage_record_preconditions

Get the stage’s record preconditions configuration value.

stage_required_fields

Get the stage’s required fields configuration value.

Pipeline ACLs

class streamsets.sdk.sdc_models.PipelineAcl(pipeline_acl)[source]

Represents a pipeline ACL.

Parameters:pipeline_acl – JSON representation of a pipeline ACL.
permissions

An instance of streamsets.sdk.sdc_models.PipelinePermissions

Pipeline Permissions

class streamsets.sdk.sdc_models.PipelinePermission(pipeline_permission)[source]

A container for a pipeline permission.

Parameters:pipeline_permission – A Python object representation of a pipeline permission.
class streamsets.sdk.sdc_models.PipelinePermissions(pipeline_permissions)[source]

Container for list of permissions for a pipeline.

Parameters:pipeline_permissions – A Python object representation of pipeline permissions.
permissions

list – A list of streamsets.sdk.sdc_models.PipelinePermission instances.

Previews

class streamsets.sdk.sdc_models.Preview(pipeline_id, previewer_id, preview)[source]

Preview.

Parameters:
  • pipeline_id (str) – Pipeline ID.
  • previewer_id (str) – Previewer ID.
  • preview – Python object representation of the preview.
issues

An instance of streamsets.sdk.sdc_models.Issues.

preview_batches

list – A list of streamsets.sdk.sdc_models.Batch instances.

Snapshots

class streamsets.sdk.sdc_models.Batch(batch)[source]

Snapshot batch.

Parameters:batch – Python object representation of the snapshot batch.
class streamsets.sdk.sdc_models.Record(record)[source]

Record.

Parameters:record – Python object representation of the record.
header

An instance of streamsets.sdk.sdc_models.RecordHeader.

value

Python object representation of the record value.

value2

A typed representation of the record value.

class streamsets.sdk.sdc_models.RecordHeader(header)[source]

Record Header.

Parameters:header – Python object representation of the record header.
class streamsets.sdk.sdc_models.Snapshot(pipeline_id, snapshot_name, snapshot)[source]

Snapshot.

Parameters:
  • pipeline_id (str) – The pipeline ID.
  • snapshot_name (str) – The snapshot name.
  • snapshot – Python object representation of the snapshot.
snapshot_batches

list – A list of streamsets.sdk.sdc_models.Batch instances.

class streamsets.sdk.sdc_models.StageOutput(stage_output)[source]

Snapshot batch’s stage output.

Parameters:stage_output – Python object representation of the stage output.
output

Gets the stage output’s output.

If the stage contains multiple lanes, use streamsets.sdk.sdc_models.StageOutput.output_lanes.

Raises:An instance of Exception if the stage contains multiple lanes

Returns:

Users

class streamsets.sdk.sdc_models.User(user)[source]

User.

Parameters:user – Python object representation of the user.
groups

Get user’s groups.

Returns:User groups as a str
name

Get user’s name.

Returns:User name as a str
roles

Get user’s roles.

Returns:User roles as a str

StreamSets Control Hub

Main interface

This is the main entry point used by users when interacting with SCH instances.

class streamsets.sdk.ControlHub(server_url, username, password)[source]

Class to interact with StreamSets Control Hub.

Parameters:
  • server_url (str) – SCH server base URL.
  • username (str) – SCH username.
  • password (str) – SCH password.
create_components(component_type, org_id=None, number_of_components=1, active=True)[source]

Create components.

Parameters:
  • component_type (str) – Component type.
  • org_id (str, optional) – Organization ID. Default: DPM organization deduced from username
  • number_of_components (int, optional) – Default: 1
  • active (bool, optional) – Default: True
Returns:

An instance of streamsets.sdk.sch_api.CreateComponentsCommand.

create_organization(new_org)[source]

Create Organization.

Parameters:new_org (streamsets.sdk.sch_models.NewOrganization) –
Returns:An instance of streamsets.sdk.sch_api.CreateOrganizationCommand

Models

These models wrap and provide useful functionality for interacting with common SCH abstractions.

Organizations

class streamsets.sdk.sch_models.NewOrganization(organization=None, organization_admin_user=None)[source]

Model that represents a new organization

organization

Gets the organization of this NewOrganization.

Args:

Returns:Organization
Return type:(Organization)
organization_admin_user

Gets the organization_admin_user of this NewOrganization.

Returns:The organization admin user
Return type:(User)
class streamsets.sdk.sch_models.Organization(id=None, name=None, creator=None, created_on=None, last_modified_by=None, last_modified_on=None, primary_admin_id=None, active=False, password_expiry_time_in_millis=None, valid_domains=None, external_auth_enabled=False)[source]

Model for an Organization

active

Gets the active of this Organization.

Returns:The active of this Organization.
Return type:(bool)
created_on

Gets the created_on of this Organization.

Returns:The created_on of this Organization.
Return type:(str)
creator

Gets the creator of this Organization.

Returns:The creator of this Organization.
Return type:(str)
external_auth_enabled

Gets the external_auth_enabled of this Organization.

Returns:The external_auth_enabled of this Organization.
Return type:(bool)
id

Gets the id of this Organization.

Returns:The id of this Organization.
Return type:(str)
last_modified_by

Gets the last_modified_by of this Organization.

Returns:The last_modified_by of this Organization.
Return type:(str)
last_modified_on

Gets the last_modified_on of this Organization.

Returns:The last_modified_on of this Organization.
Return type:(str)
name

Gets the name of this Organization.

Returns:The name of this Organization.
Return type:(str)
password_expiry_time_in_millis

Gets the password_expiry_time_in_millis of this Organization.

Returns:The password_expiry_time_in_millis of this Organization.
Return type:(int)
primary_admin_id

Gets the primary_admin_id of this Organization.

Returns:The primary_admin_id of this Organization.
Return type:(str)
valid_domains

Gets the valid_domains of this Organization.

Returns:The valid_domains of this Organization.
Return type:(str)

Users

class streamsets.sdk.sch_models.User(id=None, organization=None, name=None, email=None, roles=None, groups=None, active=False, password_expiry_time=None, creator=None, created_on=None, last_modified_by=None, last_modified_on=None, destroyer=None, delete_time=None, user_deleted=False, name_in_org=None, password_generated=False)[source]

Model for a User

active

Gets the active of this User.

Returns:The active of this User.
Return type:(bool)
created_on

Gets the created_on of this User.

Returns:The created_on of this User.
Return type:(int)
creator

Gets the creator of this User.

Returns:The creator of this User.
Return type:(str)
delete_time

Gets the delete_time of this User.

Returns:The delete_time of this User.
Return type:(int)
destroyer

Gets the destroyer of this User.

Returns:The destroyer of this User.
Return type:(str)
email

Gets the email of this User.

Returns:The email of this User.
Return type:(str)
groups

Gets the groups of this User.

Returns:The groups of this User.
Return type:(str)
id

Gets the id of this User.

Returns:The id of this User.
Return type:(str)
last_modified_by

Gets the last_modified_by of this User.

Returns:The last_modified_by of this User.
Return type:(str)
last_modified_on

Gets the last_modified_on of this User.

Returns:The last_modified_on of this User.
Return type:(int)
Return type:int
name

Gets the name of this User.

Returns:The name of this User.
Return type:(str)
name_in_org

Gets the name_in_org of this User.

Returns:The name_in_org of this User.
Return type:(str)
organization

Gets the organization of this User.

Returns:The organization of this User.
Return type:(str)
password_expiry_time

Gets the password_expiry_time of this User.

Returns:The password_expiry_time of this User.
Return type:(int)
password_generated

Gets the password_generated of this User.

Returns:The password_generated of this User.
Return type:(bool)
roles

Gets the roles of this User.

Returns:The roles of this User.
Return type:(str)
user_deleted

Gets the user_deleted of this User.

Returns:The user_deleted of this User.
Return type:(bool)

Common

Models used by StreamSets Data Collector and StreamSets Control Hub:

class streamsets.sdk.models.Configuration(configuration=None, property_key='name', property_value='value', **kwargs)[source]

Abstraction for stage configurations.

This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form

[ {
  "name" : "executionMode",
  "value" : "STANDALONE"
}, {
  "name" : "deliveryGuarantee",
  "value" : "AT_LEAST_ONCE"
}, ... ]

By implementing simple __getitem__ and __setitem__ methods, this class allows items in this list to be accessed using

configuration['executionMode'] = 'CLUSTER_BATCH'

Instead of the more verbose

for property in configuration:
    if property['name'] == 'executionMode':
        property['value'] = 'CLUSTER_BATCH'
    break
Parameters:
  • configuration (str) – List of dictionaries comprising the configuration.
  • property_key (str, optional) – The dictionary entry denoting the property key. Default: name
  • property_value (str, optional) – The dictionary entry denoting the property value. Default: value
get(key, default=None)[source]

Return the value of key or, if not in the configuration, the default value.

update(configs)[source]

Update instance with a collection of configurations.

Parameters:configs (dict) – Dictionary of configurations to use.

Exceptions

Common exceptions.

exception streamsets.sdk.exceptions.ActivationError(reason=None)[source]

Activation error.

exception streamsets.sdk.exceptions.BadRequestError(response)[source]

Bad request error (HTTP 400).

exception streamsets.sdk.exceptions.InternalServerError(response)[source]

Internal server error.