API Reference¶
The StreamSets SDK for Python is broadly divided into abstractions for interacting with StreamSet Data Collector and StreamSets Control Hub.
StreamSets Data Collector¶
Main interface¶
This is the main entry point used by users when interacting with SDC instances.
-
class
streamsets.sdk.
DataCollector
(server_url, username=None, password=None, authentication_method='form', accounts_authentication_token=None, accounts_server_url=None, control_hub=None, dump_log_on_error=False, **kwargs)[source]¶ Class to interact with StreamSets Data Collector.
If connecting to an StreamSets Control Hub-registered instance of Data Collector, create an instance of
streamsets.sdk.ControlHub
instead of instantiating with ausername
andpassword
.Parameters: - server_url (
str
) – URL of an existing SDC deployment with which to interact. - accounts_authentication_token (
str
, optional) – StreamSets Accounts server base URL. Default:None
- accounts_server_url (
str
, optional) – StreamSets Accounts authentication token. Default:None
- authentication_method (
str
, optional) – StreamSets Data Collector authentication method. Default:streamsets.sdk.constants.ENGINE_AUTHENTICATION_METHOD_FORM
. - username (
str
, optional) – SDC username. Default:streamsets.sdk.sdc.DEFAULT_SDC_USERNAME
. - password (
str
, optional) – SDC password. Default:streamsets.sdk.sdc.DEFAULT_SDC_PASSWORD
. - control_hub (
streamsets.sdk.ControlHub
, optional) – A StreamSets Control Hub instance to use for SCH-registered Data Collectors. Default:None
. - dump_log_on_error (
bool
) – Whether to output Data Collector logs when exceptions are raised by certain methods. Default:False
-
add_pipeline
(*pipelines, **kwargs)[source]¶ Add one or more pipelines to the DataCollector instance.
Parameters: *pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline
.
-
capture_snapshot
(pipeline, snapshot_name=None, start_pipeline=False, runtime_parameters=None, batches=1, batch_size=10, **kwargs)[source]¶ Capture a snapshot for given pipeline.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance. - snapshot_name (
str
, optional) – Name for the generated snapshot. If set toNone
, an auto-generated UUID (which can be recovered from the returnedSnapshotCommand
object’ssnapshot_name
attribute) will be used when calling the REST API. Default:None
. - start_pipeline (
bool
, optional) – If set to true, then the pipeline will be started and its first batch will be captured. Otherwise, the pipeline must be running, in which case one of the next batches will be captured. Default:False
. - runtime_parameters (
dict
, optional) – Runtime parameters to override Pipeline Parameters value. Default:None
. - wait (
bool
, optional) – Wait for capture snapshot to finish. Default:True
. - wait_for_statuses (
list
, optional) – Pipeline statuses to wait on. Default:['RUNNING', 'FINISHED']
. - timeout_sec (
int
) – Timeout to wait for snapshot, in seconds. Default:streamsets.sdk.sdc.DEFAULT_SNAPSHOT_TIMEOUT
.
Returns: An instance of
streamsets.sdk.sdc_api.SnapshotCommand
.- pipeline (
-
change_password
(old_password, new_password)[source]¶ Change password for the current user.
Parameters: - old_password (
str
) – old password. - new_password (
str
) – new password.
Returns: An instance of
streamsets.sdk.sdc_api.Command
.- old_password (
-
current_user
¶ Get currently logged-in user and its groups and roles.
Returns: An instance of streamsets.sdk.sdc_models.User
.
-
definitions
¶ Get an SDC instance’s definitions.
Will return a cached instance of the definitions if called more than once.
Returns: An instance of json
.
-
export_pipeline
(pipeline, include_library_definitions=False, include_plain_text_credentials=False)[source]¶ Export single pipeline to json file.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – Pipeline instance. - include_library_definitions (
boolean
) – Set to true to export for Control Hub. - include_plain_text_credentials (
boolean
) – DefaultFalse
.
Returns: A
dict
object containing the contents of pipeline.- pipeline (
-
export_pipelines
(pipelines, include_library_definitions=False, include_plain_text_credentials=False)[source]¶ Export pipelines.
Parameters: - pipelines (
list
) – A list ofstreamsets.sdk.sdc_models.Pipeline
instances. - include_library_definitions (
boolean
) – Set to true to export for Control Hub. DefaultFalse
. - include_plain_text_credentials (
boolean
) – DefaultFalse
.
Returns: An instance of type
bytes
indicating the content of zip file with pipeline json files.- pipelines (
-
get_alerts
()[source]¶ Get pipeline alerts.
Returns: An instance of streamsets.sdk.sdc_models.Alerts
.
-
get_bundle
(generators=None)[source]¶ Generate new support bundle.
Returns: An instance of zipfile.ZipFile
.
-
get_bundle_generators
()[source]¶ Get available support bundle generators.
Returns: An instance of streamsets.sdk.sdc_models.BundleGenerators
.
-
get_logs
(ending_offset=-1, extra_message=None, pipeline=None, severity=None)[source]¶ Get logs.
Parameters: - ending_offset (
int
) – ending_offset, Default:-1
. - extra_message (
str
) – extra_message, Default:None
. - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance, Default:None
. - severity (
str
) – severity, Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Log
.- ending_offset (
-
get_pipeline_acl
(pipeline)[source]¶ Get pipeline ACL.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance.Returns: An instance of streamsets.sdk.sdc_models.PipelineAcl
.
-
get_pipeline_builder
(**kwargs)[source]¶ Get a pipeline builder instance with which a pipeline can be created.
Returns: An instance of streamsets.sdk.sdc_models.PipelineBuilder
.
-
get_pipeline_history
(pipeline)[source]¶ Get a pipeline’s history.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance.Returns: An instance of streamsets.sdk.sdc_models.History
.
-
get_pipeline_metrics
(pipeline)[source]¶ Get a pipeline’s metrics.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance.Returns: An instance of streamsets.sdk.sdc_models.Metrics
.
-
get_pipeline_permissions
(pipeline)[source]¶ Return pipeline permissions for a given pipeline.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance.Returns: An instance of streamsets.sdk.sdc_models.PipelinePermissions
.
-
get_pipeline_status
(pipeline)[source]¶ Get status of a pipeline.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance.
-
get_snapshots
(pipeline=None)[source]¶ Get information about stored snapshots.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
, optional) – The pipeline instance. Default:None
.Returns: A list of streamsets.sdk.sdc_models.SnapshotInfo
instances.
-
get_stage_errors
(pipeline, stage)[source]¶ Get stage errors.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – Pipeline. - stage (
streamsets.sdk.sdc_models.Stage
) – Stage.
Returns: A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sdc_models.StageError
instances.- pipeline (
-
get_stage_library_version
(stage)[source]¶ Get the stage library version.
Parameters: stage ( streamsets.sdk.sdc_models.Stage
) – stage objectReturns: An instance of str
-
id
¶ Return id for StreamSets Data Collector.
Returns: A str
SDC ID.
-
import_pipeline
(pipeline)[source]¶ Import pipeline from json file.
Parameters: pipeline ( dict
) – JSON data loaded from file. Example usage: json.load(open(filename, ‘r’)).Returns: An instance of streamsets.sdk.sdc_models.Pipeline
.
-
import_pipelines_from_archive
(archive)[source]¶ Import pipelines from archived zip directory.
Parameters: archive ( file
) – file containing the pipelines.Returns: A streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sdc_models.Pipeline
.
-
pipelines
¶ Get all pipelines in the pipeline store.
Returns: - A
streamsets.sdk.utils.SeekableList
of streamsets.sdk.sdc_models.Pipeline
instances.
- A
-
remove_pipeline
(*pipelines)[source]¶ Remove one or more pipelines from the DataCollector instance.
Parameters: *pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline
.
-
reset_origin
(pipeline)[source]¶ Reset origin offset.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
) – Pipeline object.Returns: An instance of streamsets.sdk.sdc_api.Command
.
-
run_dynamic_pipeline_preview
(type, parameters={}, batches=1, batch_size=1, skip_targets=True, skip_lifecycle_events=True, end_stage=None, timeout=10000, test_origin=False, stage_outputs_to_override_json_text=None, stage_outputs_to_override_json=[], **kwargs)[source]¶ Run dynamic pipeline preview.
Parameters: - type (
str
) – Dynamic preview request type.CLASSIFICATION_CATALOG
orPROTECTION_POLICY
- parameters (
dict
, optional) – Dynamic preview request parameters. Default:{}
- batches (
int
, optional) – Number of batches. Default:1
. - batch_size (
int
, optional) – Batch size. Default:1
. - skip_targets (
bool
, optional) – Skip targets. Default:True
. - skip_lifecycle_events (
bool
, optional) – Skip life cycle events. Default:True
. - end_stage (
str
, optional) – End stage. Default:None
. - timeout (
int
, optional) – Timeout. Default:10000
. - test_origin (
bool
, optional) – Test origin. Default:False
. - stage_outputs_to_override_json_text (
str
, optional) – Stage outputs to override text. Default:None
. - stage_outputs_to_override_json (
list
, optional) – Stage outputs to override. Default:[]
. - wait (
bool
, optional) – Wait for pipeline preview to finish. Default:True
.
Returns: An instance of
streamsets.sdk.sdc_api.PreviewCommand
.- type (
-
run_pipeline_preview
(pipeline, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=2000, test_origin=False, stage_outputs_to_override_json=None, **kwargs)[source]¶ Run pipeline preview.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance. - rev (
int
, optional) – Pipeline revision. Default:0
. - batches (
int
, optional) – Number of batches. Default:1
. - batch_size (
int
, optional) – Batch size. Default:10
. - skip_targets (
bool
, optional) – Skip targets. Default:True
. - end_stage (
str
, optional) – End stage. Default:None
. - timeout (
int
, optional) – Timeout. Default:2000
. - test_origin (
bool
, optional) – Test origin. Default:False
- stage_outputs_to_override_json (
str
, optional) – Stage outputs to override. Default:None
. - wait (
bool
, optional) – Wait for pipeline preview to finish. Default:True
.
Returns: An instance of
streamsets.sdk.sdc_api.PreviewCommand
.- pipeline (
-
sample_pipelines
¶ Get all sample pipelines in the pipeline store.
Returns: - A
streamsets.sdk.utils.SeekableList
of streamsets.sdk.sdc_models.Pipeline
instances.
- A
-
sdc_configuration
¶ Return all configurations for StreamSets Data Collector.
Returns: A dict
with property names as keys and property values as values.
-
set_pipeline_acl
(pipeline, pipeline_acl)[source]¶ Update pipeline ACL.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance. - pipeline_acl (
streamsets.sdk.sdc_models.PipelineAcl
) – The pipeline ACL instance.
Returns: An instance of
streamsets.sdk.sdc_api.Command
.- pipeline (
-
set_user
(username, password=None)[source]¶ Set the user with which to interact with SDC.
Parameters: - username (
str
) – Username of user. - password (
str
, optional) – Password for user. Default: same asusername
.
- username (
-
stage_libraries
¶ Get all stage libraries.
Returns: - A
streamsets.sdk.utils.SeekableList
of streamsets.sdk.sdc_models.StageLibrary
instances.
- A
-
start_pipeline
(pipeline, runtime_parameters=None, **kwargs)[source]¶ Start a pipeline.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance. - runtime_parameters (
dict
, optional) – Collection of runtime parameters. Default:None
. - wait (
bool
, optional) – Wait for pipeline to start. Default:True
. - wait_for_statuses (
list
, optional) – Pipeline statuses to wait on. Default:['RUNNING', 'FINISHED']
. - timeout_sec (
int
) – Timeout to wait for pipeline statuses, in seconds. Default:streamsets.sdk.sdc.DEFAULT_START_TIMEOUT
.
Returns: An instance of
streamsets.sdk.sdc_api.PipelineCommand
.- pipeline (
-
stop_pipeline
(pipeline, **kwargs)[source]¶ Stop a pipeline.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance. - force (
bool
, optional) – Force pipeline to stop. Default:False
. - wait (
bool
, optional) – Wait for pipeline to stop. Default:True
. - timeout_sec (
int
) – Timeout to wait for pipeline stop, in seconds. Default:streamsets.sdk.sdc.DEFAULT_STOP_TIMEOUT
.
Returns: An instance of
streamsets.sdk.sdc_api.StopPipelineCommand
.- pipeline (
-
update_pipeline
(*pipelines)[source]¶ Update one or more pipelines in the DataCollector instance.
Parameters: *pipelines – One or more instances of streamsets.sdk.sdc_models.Pipeline
.
-
validate_pipeline
(pipeline)[source]¶ Validate a pipeline.
Parameters: pipeline ( streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance.
-
version
¶ Return the version of the Data Collector.
Returns: The version string. Return type: str
-
wait_for_pipeline_metric
(pipeline, metric, value, timeout_sec=30)[source]¶ Block until a pipeline metric reaches the desired value.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance. - metric (
str
) – The desired metric (e.g.'output_record_count'
or'data_batch_count'
). - value – The desired value to wait for.
- timeout_sec (
int
, optional) – Timeout to wait formetric
to reachvalue
, in seconds. Default:streamsets.sdk.sdc.DEFAULT_WAIT_FOR_METRIC_TIMEOUT
.
Raises: TimeoutError
– Iftimeout_sec
passes withoutmetric
reachingvalue
.- pipeline (
-
wait_for_pipeline_status
(pipeline, status, timeout_sec=30)[source]¶ Block until a pipeline reaches the desired status.
Parameters: - pipeline (
streamsets.sdk.sdc_models.Pipeline
) – The pipeline instance. - status (
str
) – The desired status to wait for. - timeout_sec (
int
, optional) – Timeout to wait forpipeline
to reachstatus
, in seconds. Default:streamsets.sdk.sdc.DEFAULT_WAIT_FOR_STATUS_TIMEOUT
.
Raises: TimeoutError
– Iftimeout_sec
passes withoutpipeline
reachingstatus
.- pipeline (
- server_url (
-
sdc.
DEFAULT_SDC_USERNAME
= 'admin'¶
-
sdc.
DEFAULT_SDC_PASSWORD
= 'admin'¶
Models¶
These models wrap and provide useful functionality for interacting with common SDC abstractions.
Alerts¶
-
class
streamsets.sdk.sdc_models.
Alert
(alert)[source]¶ Pipeline alert.
Parameters: alert ( dict
) – Python object representation of a pipeline alert.-
alert_texts
¶ Get alert’s alert texts.
Returns: The alert’s alert texts as a str
.
-
label
¶ Get alert’s label.
Returns: The alert’s label as a str
.
-
pipeline_id
¶ Get alert’s pipeline ID.
Returns: The pipeline ID as a str
.
-
-
class
streamsets.sdk.sdc_models.
Alerts
(alerts)[source]¶ Container for list of alerts with filtering capabilities.
Parameters: alerts ( dict
) – Python object representation of alerts.-
alerts
¶ list
– A list ofstreamsets.sdk.sdc_models.Alert
instances.
-
for_pipeline
(pipeline)[source]¶ Get alerts for the specified pipeline.
Parameters: pipeline ( str
) – The pipeline for which to get alerts.Returns: An instance of streamsets.sdk.sdc_models.Alerts
.
-
Data Rules¶
-
class
streamsets.sdk.sdc_models.
DataDriftRule
(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text='${alert:info()}', send_email=False, active=False)[source]¶ Pipeline data drift rule.
Parameters: - stream (
str
) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here. - label (
str
) – Rule label. - condition (
str
, optional) – Data rule condition. Default:None
. - sampling_percentage (
int
, optional) – Default:5
. - sampling_records_to_retain (
int
, optional) – Default:10
. - enable_meter (
bool
, optional) – Default:True
. - enable_alert (
bool
, optional) – Default:True
. - alert_text (
str
, optional) – Default:'${alert:info()}'
. - send_email (
bool
, optional) – Default:False
. - active (
bool
, optional) – Enable the data rule. Default:False
.
-
active
¶ The rule is active.
Returns: A bool
.
- stream (
-
class
streamsets.sdk.sdc_models.
DataRule
(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text=None, threshold_type='count', threshold_value=100, min_volume=1000, send_email=False, active=False)[source]¶ Pipeline data rule.
Parameters: - stream (
str
) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here. - label (
str
) – Rule label. - condition (
str
, optional) – Data rule condition. Default:None
. - sampling_percentage (
int
, optional) – Default:5
. - sampling_records_to_retain (
int
, optional) – Default:10
. - enable_meter (
bool
, optional) – Default:True
. - enable_alert (
bool
, optional) – Default:True
. - alert_text (
str
, optional) – Default:None
. - threshold_type (
str
, optional) – One ofcount
orpercentage
. Default:'count'
. - threshold_value (
int
, optional) – Default:100
. - min_volume (
int
, optional) – Only set ifthreshold_type
ispercentage
. Default:1000
. - send_email (
bool
, optional) – Default:False
. - active (
bool
, optional) – Enable the data rule. Default:False
.
-
active
¶ Returns if the rule is active or not.
Returns: A bool
.
- stream (
History¶
-
class
streamsets.sdk.sdc_models.
History
(history)[source]¶ Pipeline history.
Parameters: history ( dict
) – Python object representation of the pipeline history.-
entries
¶ list
– A list ofstreamsets.sdk.sdc_models.HistoryEntry
instances.
-
latest
¶ Get pipeline history’s latest entry.
Returns: The most recent pipeline history entry as an instance of streamsets.sdk.sdc_models.HistoryEntry
.
-
-
class
streamsets.sdk.sdc_models.
HistoryEntry
(entry)[source]¶ Pipeline history entry.
Parameters: entry ( dict
) – Python object representation of the history entry.-
metrics
¶ Get pipeline history entry’s metrics.
Returns: The pipeline history entry’s metrics as an instance of streamsets.sdk.sdc_models.Metrics
.
-
Issues¶
-
class
streamsets.sdk.sdc_models.
Issue
(issue)[source]¶ Issue encountered for a pipeline or a stage.
Parameters: issue ( dict
) – Python object representation of the issue.
-
class
streamsets.sdk.sdc_models.
Issues
(issues)[source]¶ Issues encountered for pipelines as well as stages.
Parameters: issues ( dict
) – Python object representation of the issues.-
issues_count
¶ int
– The number of issues.
-
pipeline_issues
¶ list
– A list ofstreamsets.sdk.sdc_models.Issue
instances.
-
stage_issues
¶ dict
– A dictionary mapping stage names to instances ofstreamsets.sdk.sdc_models.Issue
.
-
Logs¶
-
class
streamsets.sdk.sdc_models.
Log
(log)[source]¶ Model for SDC logs.
Parameters: log ( list
) – A list of dictionaries (JSON representation) of the log.
Metrics¶
-
class
streamsets.sdk.sdc_models.
MetricCounter
(counter)[source]¶ Metric counter.
Parameters: counter ( dict
) – Python object representation of a metric counter.-
count
¶ Get the metric counter’s count.
Returns: The metric counter’s count as an int
.
-
-
class
streamsets.sdk.sdc_models.
MetricGauge
(gauge)[source]¶ Metric gauge.
Parameters: gauge ( dict
) – Python object representation of a metric gauge.-
value
¶ Get the metric gauge’s value.
Returns: The metric gauge’s value as a str
.
-
-
class
streamsets.sdk.sdc_models.
MetricHistogram
(histogram)[source]¶ Metric histogram.
Parameters: histogram ( dict
) – Python object representation of a metric histogram.
-
class
streamsets.sdk.sdc_models.
MetricTimer
(timer)[source]¶ Metric timer.
Parameters: timer ( dict
) – Python object representation of a metric timer.-
count
¶ Get the metric timer’s count.
Returns: The metric timer’s count as an int
.
-
-
class
streamsets.sdk.sdc_models.
Metrics
(metrics)[source]¶ Metrics.
Parameters: metrics ( dict
) – Python object representation of metrics.-
counter
(name)[source]¶ Get the metric counter from metrics.
Parameters: name ( str
) – Counter name.Returns: The metric counter as an instance of streamsets.sdk.sdc_models.MetricCounter
.
-
gauge
(name)[source]¶ Get the metric gauge from metrics.
Parameters: name ( str
) – Gauge name.Returns: The metric gauge as an instance of streamsets.sdk.sdc_models.MetricGauge
.
-
histogram
(name)[source]¶ Get the metric histogram from metrics.
Parameters: name ( str
) – Histogram name.Returns: The metric histogram as an instance of streamsets.sdk.sdc_models.MetricHistogram
.
-
pipeline
¶ Get pipeline-level metrics.
Returns: An instance of streamsets.sdk.sdc_models.PipelineMetrics
.
-
timer
(name)[source]¶ Get the metric timer from metrics.
Parameters: name ( str
) – Timer namer.Returns: The metric timer as an instance of streamsets.sdk.sdc_models.MetricTimer
.
-
Pipelines¶
-
class
streamsets.sdk.sdc_models.
PipelineBuilder
(pipeline, definitions, fragment=False, data_collector=None)[source]¶ Class with which to build SDC pipelines.
This class allows a user to programmatically generate an SDC pipeline. Instead of instantiating this class directly, most users should use
streamsets.sdk.DataCollector.get_pipeline_builder()
.Parameters: - pipeline (
dict
) – Python object representing an empty pipeline. If created manually, this would come from creating a new pipeline in SDC and then exporting it before doing any configuration. - definitions (
dict
) – The output of SDC’s definitions endpoint. - fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
-
add_data_drift_rule
(*data_drift_rules)[source]¶ Add one or more data drift rules to the pipeline.
Parameters: *data_drift_rules – One or more instances of streamsets.sdk.sdc_models.DataDriftRule
.
-
add_data_rule
(*data_rules)[source]¶ Add one or more data rules to the pipeline.
Parameters: *data_rules – One or more instances of streamsets.sdk.sdc_models.DataRule
.
-
add_error_stage
(label=None, name=None, library=None)[source]¶ Add an error stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – SDC stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – SDC stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – SDC stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Stage
.- label (
-
add_fragment
(fragment, parameter_name_prefix=None)[source]¶ Add a fragment to the pipeline.
Parameters: - fragment (py:obj:streamsets.sdk.sch_models.Pipeline) – Fragment to add.
- parameter_name_prefix (
str
, optional) – Prefix name for the parameters of fragment. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Stage
.
-
add_metric_rule
(*metric_rules)[source]¶ Add one or more metric rules to the pipeline.
Parameters: *data_rules – One or more instances of streamsets.sdk.sdc_models.MetricRule
.
-
add_stage
(label=None, name=None, type=None, library=None)[source]¶ Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – SDC stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – SDC stage name to use when selecting stage from definitions. Default:None
. - type (
str
, optional) – SDC stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
. - library (
str
, optional) – SDC stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Stage
.- label (
-
add_start_event_stage
(label=None, name=None, library=None)[source]¶ Add start event stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – SDC stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – SDC stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – SDC stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Stage
.- label (
-
add_stats_aggregator_stage
(label=None, name=None, library=None)[source]¶ Add a stats aggregator stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – SDC stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – SDC stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – SDC stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Stage
.- label (
-
add_stop_event_stage
(label=None, name=None, library=None)[source]¶ Add stop event stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – SDC stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – SDC stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – SDC stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Stage
.- label (
-
add_test_origin_stage
(label=None, name=None, library=None)[source]¶ Add test origin stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – SDC stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – SDC stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – SDC stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_models.Stage
.- label (
-
build
(title='Pipeline', **kwargs)[source]¶ Build the pipeline.
Parameters: title ( str
, optional) – Pipeline title to use. Default:'Pipeline'
.Returns: An instance of streamsets.sdk.sdc_models.Pipeline
.
-
import_pipeline
(pipeline, **kwargs)[source]¶ Import a pipeline into the PipelineBuilder.
Parameters: pipeline ( dict
) – Exported pipeline.Returns: An instance of streamsets.sdk.sdc_models.PipelineBuilder
.
- pipeline (
-
class
streamsets.sdk.sdc_models.
Pipeline
(pipeline, all_stages=None, fragment=False)[source]¶ SDC pipeline.
This class provides abstractions to make it easier to interact with a pipeline before it’s imported into SDC.
Parameters: - pipeline (
dict
) – A Python object representing the serialized pipeline. - all_stages (
dict
, optional) – A dictionary mapping stage names tostreamsets.sdk.sdc_models.Stage
instances. Default:None
. - fragment (
boolean
, optional) – Specify if a fragment. Default:False
.
-
add_parameters
(**parameters)[source]¶ Add pipeline parameters.
Parameters: **parameters – Keyword arguments to add.
-
configuration
¶ Get pipeline’s configuration.
Returns: An instance of streamsets.sdk.models.Configuration
.
-
delivery_guarantee
¶ Get the delivery guarantee.
Returns: The delivery guarantee as a str
.
-
id
¶ Get the pipeline id.
Returns: The pipeline id as a str
.
-
metadata
¶ Get the pipeline metadata.
Returns: Pipeline metadata as a Python object.
-
origin_stage
¶ Get the pipeline’s origin stage.
Returns: An instance of streamsets.sdk.sdc_models.Stage
.
-
parameters
¶ Get the pipeline parameters.
Returns: A dict
of parameter key-value pairs.
-
rate_limit
¶ Get the rate limit (records/sec).
Returns: The rate limit as a str
.
-
title
¶ Get the pipeline title.
Returns: The pipeline title as a str
.
- pipeline (
-
class
streamsets.sdk.sdc_models.
Stage
(stage, label=None)[source]¶ Pipeline stage.
Parameters: - stage – JSON representation of the pipeline stage.
- label (
str
, optional) – Human-readable stage label. Default:None
.
-
configuration
¶ streamsets.sdk.models.Configuration
– The stage configuration.
-
services
¶ dict
– If supported by the stage, a dictionary mapping a service name to an instance ofstreamsets.sdk.models.Configuration
.
-
add_output
(*other_stages, event_lane=False)[source]¶ Connect output of this stage to another stage.
The __rshift__ operator (>>) has been overloaded to invoke this method.
Parameters: other_stage ( streamsets.sdk.sdc_models.Stage
) – Stage object.Returns: This stage as an instance of streamsets.sdk.sdc_models.Stage
).
-
description
¶ str
– The stage’s description.
-
event_lanes
¶ Get the stage’s list of event lanes.
Returns: A list
of event lanes.
-
label
¶ str
– The stage’s label.
-
library
¶ Get the stage’s library.
Returns: The stage library as a str
.
-
output_lanes
¶ Get the stage’s list of output lanes.
Returns: A list
of output lanes.
-
set_attributes
(**attributes)[source]¶ Set one or more stage attributes.
Parameters: **attributes – Attributes to set. Returns: This stage as an instance of streamsets.sdk.sdc_models.Stage
.
-
stage_on_record_error
¶ The stage’s on record error configuration value.
-
stage_record_preconditions
¶ The stage’s record preconditions configuration value.
-
stage_required_fields
¶ The stage’s required fields configuration value.
Pipeline ACLs¶
-
class
streamsets.sdk.sdc_models.
PipelineAcl
(pipeline_acl)[source]¶ Represents a pipeline ACL.
Parameters: pipeline_acl ( dict
) – JSON representation of a pipeline ACL.-
permissions
¶ streamsets.sdk.sdc_models.PipelinePermissions
– Pipeline Permissions object.
-
Pipeline Permissions¶
-
class
streamsets.sdk.sdc_models.
PipelinePermission
(pipeline_permission)[source]¶ A container for a pipeline permission.
Parameters: pipeline_permission ( dict
) – A Python object representation of a pipeline permission.
-
class
streamsets.sdk.sdc_models.
PipelinePermissions
(pipeline_permissions)[source]¶ Container for list of permissions for a pipeline.
Parameters: pipeline_permissions ( dict
) – A Python object representation of pipeline permissions.-
permissions
¶ list
– A list ofstreamsets.sdk.sdc_models.PipelinePermission
instances.
-
Previews¶
-
class
streamsets.sdk.sdc_models.
Preview
(pipeline_id, previewer_id, preview)[source]¶ Preview.
Parameters: - pipeline_id (
str
) – Pipeline ID. - previewer_id (
str
) – Previewer ID. - preview (
dict
) – Python object representation of the preview.
-
issues
¶ dict
– An instance ofstreamsets.sdk.sdc_models.Issues
.
-
preview_batches
¶ list
– A list ofstreamsets.sdk.sdc_models.Batch
instances.
- pipeline_id (
Snapshots¶
-
class
streamsets.sdk.sdc_models.
Batch
(batch)[source]¶ Snapshot batch.
Parameters: batch – Python object representation of the snapshot batch.
-
class
streamsets.sdk.sdc_models.
Record
(record)[source]¶ Record.
Parameters: record ( dict
) – Python object representation of the record.-
header
¶ dict
– An instance ofstreamsets.sdk.sdc_models.RecordHeader
.
-
value
¶ dict
– Python object representation of the record value.
-
value2
¶ A typed representation of the record value.
-
-
class
streamsets.sdk.sdc_models.
RecordHeader
(header)[source]¶ Record Header.
Parameters: header ( dict
) – Python object representation of the record header.
-
class
streamsets.sdk.sdc_models.
Snapshot
(pipeline_id, snapshot_name, snapshot)[source]¶ Snapshot.
Parameters: - pipeline_id (
str
) – The pipeline ID. - snapshot_name (
str
) – The snapshot name. - snapshot (
dict
) – Python object representation of the snapshot.
-
snapshot_batches
¶ list
– A list ofstreamsets.sdk.sdc_models.Batch
instances.
- pipeline_id (
-
class
streamsets.sdk.sdc_models.
StageOutput
(stage_output)[source]¶ Snapshot batch’s stage output.
Parameters: stage_output – Python object representation of the stage output. -
output
¶ Gets the stage output’s output.
If the stage contains multiple lanes, use
streamsets.sdk.sdc_models.StageOutput.output_lanes
.Raises: An instance of Exception
if the stage contains multiple lanes.Returns: An instance of streamsets.sdk.sdc_models.Record
.
-
StreamSets Transformer¶
Main interface¶
This is the main entry point used by users when interacting with Transformer instances.
-
class
streamsets.sdk.
Transformer
(server_url, username=None, password=None, authentication_method='form', accounts_authentication_token=None, accounts_server_url=None, control_hub=None, dump_log_on_error=False, **kwargs)[source]¶ Class to interact with StreamSets Transformer.
Parameters: - server_url (
str
) – URL of an existing ST deployment with which to interact. - username (
str
, optional) – ST username. Default:streamsets.sdk.st.DEFAULT_ST_USERNAME
. - password (
str
, optional) – ST password. Default:streamsets.sdk.st.DEFAULT_ST_PASSWORD
. - authentication_method (
str
, optional) – StreamSets Transformer authentication method. Default:streamsets.sdk.constants.ENGINE_AUTHENTICATION_METHOD_FORM
. - accounts_authentication_token (
str
, optional) – StreamSets Accounts authentication token. Default:None
- accounts_server_url (
str
, optional) – StreamSets Accounts server base URL. Default:None
- control_hub (
streamsets.sdk.ControlHub
, optional) – A StreamSets Control Hub instance to use for SCH-registered Transformers. Default:None
. - dump_log_on_error (
bool
) – Whether to output Transformer logs when exceptions are raised by certain methods. Default:False
-
add_pipeline
(*pipelines)[source]¶ Add one or more pipelines to the Transformer instance.
Parameters: *pipelines – One or more instances of streamsets.sdk.st_models.Pipeline
.
-
capture_snapshot
(pipeline, snapshot_name=None, start_pipeline=False, runtime_parameters=None, batches=1, batch_size=10, **kwargs)[source]¶ Capture a snapshot for given pipeline.
Parameters: - pipeline (
streamsets.sdk.st_models.Pipeline
) – The pipeline instance. - snapshot_name (
str
, optional) – Name for the generated snapshot. If set toNone
, an auto-generated UUID (which can be recovered from the returnedSnapshotCommand
object’ssnapshot_name
attribute) will be used when calling the REST API. Default:None
. - start_pipeline (
bool
, optional) – If set to true, then the pipeline will be started and its first batch will be captured. Otherwise, the pipeline must be running, in which case one of the next batches will be captured. Default:False
. - runtime_parameters (
dict
, optional) – Runtime parameters to override Pipeline Parameters value. Default:None
. - wait (
bool
, optional) – Wait for capture snapshot to finish. Default:True
. - wait_for_statuses (
list
, optional) – Pipeline statuses to wait on. Default:['RUNNING', 'FINISHED']
. - timeout_sec (
int
) – Timeout to wait for snapshot, in seconds. Default:streamsets.sdk.st.DEFAULT_SNAPSHOT_TIMEOUT
. - time_between_checks (
int
, optional) – Time to sleep between snapshot status checks. Applicable when`wait`
is enabled, in seconds. Default:streamsets.sdk.st.DEFAULT_SNAPSHOT_TIME_BETWEEN_CHECKS
.
Returns: An instance of
streamsets.sdk.st_api.SnapshotCommand
.- pipeline (
-
change_password
(old_password, new_password)[source]¶ Change password for the current user.
Parameters: - old_password (
str
) – old password. - new_password (
str
) – new password.
Returns: An instance of
streamsets.sdk.st_api.Command
.- old_password (
-
current_user
¶ Get currently logged-in user and its groups and roles.
Returns: An instance of streamsets.sdk.st_models.User
.
-
definitions
¶ Get an ST instance’s definitions.
Will return a cached instance of the definitions if called more than once.
Returns: An instance of json
.
-
get_alerts
()[source]¶ Get pipeline alerts.
Returns: An instance of streamsets.sdk.st_models.Alerts
.
-
get_bundle
(generators=None)[source]¶ Generate new support bundle.
Returns: An instance of zipfile.ZipFile
.
-
get_bundle_generators
()[source]¶ Get available support bundle generators.
Returns: An instance of streamsets.sdk.st_models.BundleGenerators
.
-
get_logs
(ending_offset=-1, extra_message=None, pipeline=None, severity=None)[source]¶ Get logs.
Parameters: - ending_offset (
int
) – ending_offset, Default:-1
. - extra_message (
str
) – extra_message, Default:None
. - pipeline (
streamsets.sdk.st_models.Pipeline
) – The pipeline instance, Default:None
. - severity (
str
) – severity, Default:None
.
Returns: An instance of
streamsets.sdk.st_models.Log
.- ending_offset (
-
get_pipeline
(pipeline_id)[source]¶ Get a pipeline.
Parameters: pipeline_id ( str
) – Id of pipeline.Returns: An instance of streamsets.sdk.st_models.Pipeline
.
-
get_pipeline_acl
(pipeline)[source]¶ Get pipeline ACL.
Parameters: pipeline ( streamsets.sdk.st_models.Pipeline
) – The pipeline instance.Returns: An instance of streamsets.sdk.st_models.PipelineAcl
.
-
get_pipeline_builder
(**kwargs)[source]¶ Get a pipeline builder instance with which a pipeline can be created.
Returns: An instance of streamsets.sdk.st_models.PipelineBuilder
.
-
get_pipeline_history
(pipeline)[source]¶ Get a pipeline’s history.
Parameters: pipeline ( streamsets.sdk.st_models.Pipeline
) – The pipeline instance.Returns: An instance of streamsets.sdk.st_models.History
.
-
get_pipeline_permissions
(pipeline)[source]¶ Return pipeline permissions for a given pipeline.
Parameters: pipeline ( streamsets.sdk.st_models.Pipeline
) – The pipeline instance.Returns: An instance of streamsets.sdk.st_models.PipelinePermissions
.
-
get_pipeline_status
(pipeline)[source]¶ Get status of a pipeline.
Parameters: pipeline ( streamsets.sdk.st_models.Pipeline
) – The pipeline instance.
-
get_snapshots
(pipeline=None)[source]¶ Get information about stored snapshots.
Parameters: pipeline ( streamsets.sdk.st_models.Pipeline
, optional) – The pipeline instance. Default:None
.Returns: A list of streamsets.sdk.st_models.SnapshotInfo
instances.
-
id
¶ Return id for StreamSets Transformer.
Returns: A str
Transformer ID.
-
pipelines
¶ Get all pipelines in the pipeline store.
Returns: - A
streamsets.sdk.utils.SeekableList
of streamsets.sdk.st_models.Pipeline
instances.
- A
-
remove_pipeline
(*pipelines)[source]¶ Remove one or more pipelines from the Transformer instance.
Parameters: *pipelines – One or more instances of streamsets.sdk.st_models.Pipeline
.
-
reset_origin
(pipeline)[source]¶ Reset origin offset.
Parameters: pipeline ( streamsets.sdk.st_models.Pipeline
) – Pipeline object.Returns: An instance of streamsets.sdk.st_api.Command
.
-
run_pipeline_preview
(pipeline, rev=0, batches=1, batch_size=10, skip_targets=True, end_stage=None, timeout=120000, stage_outputs_to_override_json=None, **kwargs)[source]¶ Run pipeline preview.
Parameters: - pipeline (
streamsets.sdk.st_models.Pipeline
) – The pipeline instance. - rev (
int
, optional) – Pipeline revision. Default:0
. - batches (
int
, optional) – Number of batches. Default:1
. - batch_size (
int
, optional) – Batch size. Default:10
. - skip_targets (
bool
, optional) – Skip targets. Default:True
. - end_stage (
str
, optional) – End stage. Default:None
. - timeout (
int
, optional) – Server side preview Timeout in milliseconds. Default:120000
. - stage_outputs_to_override_json (
str
, optional) – Stage outputs to override. Default:None
. - remote (
bool
, optional) – Remote preview (i.e. run on the cluster). Default:False
. - timeout_sec (
int
, optional) – Client side preview timeout, in seconds. Default:streamsets.sdk.st.DEFAULT_PREVIEW_CLIENT_TIMEOUT_SEC
. - wait (
bool
, optional) – Wait for pipeline preview to finish. Default:True
. - time_between_checks (
int
, optional) – Time to sleep between preview status checks. Applicable when`wait`
is enabled, in seconds. Default:streamsets.sdk.st.DEFAULT_PREVIEW_TIME_BETWEEN_CHECKS
.
Returns: An instance of
streamsets.sdk.st_api.PreviewCommand
.- pipeline (
-
set_pipeline_acl
(pipeline, pipeline_acl)[source]¶ Update pipeline ACL.
Parameters: - pipeline (
streamsets.sdk.st_models.Pipeline
) – The pipeline instance. - pipeline_acl (
streamsets.sdk.st_models.PipelineAcl
) – The pipeline ACL instance.
Returns: An instance of
streamsets.sdk.st_api.Command
.- pipeline (
-
set_user
(username, password=None)[source]¶ Set the user with which to interact with ST.
Parameters: - username (
str
) – Username of user. - password (
str
, optional) – Password for user. Default: same asusername
.
- username (
-
start_pipeline
(pipeline, runtime_parameters=None, **kwargs)[source]¶ Start a pipeline.
Parameters: - pipeline (
streamsets.sdk.st_models.Pipeline
) – The pipeline instance. - runtime_parameters (
dict
, optional) – Collection of runtime parameters. Default:None
. - wait (
bool
, optional) – Wait for pipeline to start. Default:True
. - wait_for_statuses (
list
, optional) – Pipeline statuses to wait on. Default:['RUNNING', 'FINISHED']
. - timeout_sec (
int
) – Timeout to wait for pipeline statuses, in seconds. Default:streamsets.sdk.st.DEFAULT_START_TIMEOUT
.
Returns: An instance of
streamsets.sdk.st_api.PipelineCommand
.- pipeline (
-
stop_pipeline
(pipeline, **kwargs)[source]¶ Stop a pipeline.
Parameters: - pipeline (
streamsets.sdk.st_models.Pipeline
) – The pipeline instance. - force (
bool
, optional) – Force pipeline to stop. Default:False
. - wait (
bool
, optional) – Wait for pipeline to stop. Default:True
. - timeout_sec (
int
) – Timeout to wait for pipeline stop, in seconds. Default:streamsets.sdk.st.DEFAULT_STOP_TIMEOUT
.
Returns: An instance of
streamsets.sdk.st_api.StopPipelineCommand
.- pipeline (
-
transformer_configuration
¶ Return all configurations for StreamSets Transformer. :returns: A
dict
with property names as keys and property values as values.
-
validate_pipeline
(pipeline, **kwargs)[source]¶ Validate a pipeline.
Parameters: - pipeline (
streamsets.sdk.st_models.Pipeline
) – The pipeline instance. - timeout (
int
, optional) – Server side validate Timeout in seconds. Default:streamsets.sdk.st.DEFAULT_VALIDATE_SERVER_TIMEOUT_SEC
. - timeout_sec (
int
, optional) – Client side validate timeout, in seconds. Default:streamsets.sdk.st.DEFAULT_VALIDATE_CLIENT_TIMEOUT_SEC
. - time_between_checks (
int
, optional) – Time to sleep between validation checks. Default:streamsets.sdk.st.DEFAULT_VALIDATE_TIME_BETWEEN_CHECKS
. - using_configured_cluster_manager (
bool
, optional) – Validate pipeline using configured cluster manager. Default:True
.
- pipeline (
-
version
¶ Return the version of the Transformer.
Returns: The version string. Return type: str
- server_url (
Models¶
These models wrap and provide useful functionality for interacting with common SCH abstractions.
Alerts¶
-
class
streamsets.sdk.st_models.
Alert
(alert)[source]¶ Pipeline alert.
Parameters: alert ( dict
) – Python object representation of a pipeline alert.-
alert_texts
¶ Get alert’s alert texts.
Returns: The alert’s alert texts as a str
.
-
label
¶ Get alert’s label.
Returns: The alert’s label as a str
.
-
pipeline_id
¶ Get alert’s pipeline ID.
Returns: The pipeline ID as a str
.
-
-
class
streamsets.sdk.st_models.
Alerts
(alerts)[source]¶ Container for list of alerts with filtering capabilities.
Parameters: alerts ( dict
) – Python object representation of alerts.-
alerts
list
– A list ofstreamsets.sdk.st_models.Alert
instances.
-
for_pipeline
(pipeline)[source]¶ Get alerts for the specified pipeline.
Parameters: pipeline ( str
) – The pipeline for which to get alerts.Returns: An instance of streamsets.sdk.st_models.Alerts
.
-
Data Rules¶
-
class
streamsets.sdk.st_models.
DataDriftRule
(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text='${alert:info()}', send_email=False, active=False)[source]¶ Pipeline data drift rule.
Parameters: - stream (
str
) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here. - label (
str
) – Rule label. - condition (
str
, optional) – Data rule condition. Default:None
. - sampling_percentage (
int
, optional) – Default:5
. - sampling_records_to_retain (
int
, optional) – Default:10
. - enable_meter (
bool
, optional) – Default:True
. - enable_alert (
bool
, optional) – Default:True
. - alert_text (
str
, optional) – Default:'${alert:info()}'
. - send_email (
bool
, optional) – Default:False
. - active (
bool
, optional) – Enable the data rule. Default:False
.
-
active
¶ The rule is active.
Returns: A bool
.
- stream (
-
class
streamsets.sdk.st_models.
DataRule
(stream, label, condition=None, sampling_percentage=5, sampling_records_to_retain=10, enable_meter=True, enable_alert=True, alert_text=None, threshold_type='count', threshold_value=100, min_volume=1000, send_email=False, active=False)[source]¶ Pipeline data rule.
Parameters: - stream (
str
) – Stream to use for data rule. An entry from a Stage instance’s output_lanes list is typically used here. - label (
str
) – Rule label. - condition (
str
, optional) – Data rule condition. Default:None
. - sampling_percentage (
int
, optional) – Default:5
. - sampling_records_to_retain (
int
, optional) – Default:10
. - enable_meter (
bool
, optional) – Default:True
. - enable_alert (
bool
, optional) – Default:True
. - alert_text (
str
, optional) – Default:None
. - threshold_type (
str
, optional) – One ofcount
orpercentage
. Default:'count'
. - threshold_value (
int
, optional) – Default:100
. - min_volume (
int
, optional) – Only set ifthreshold_type
ispercentage
. Default:1000
. - send_email (
bool
, optional) – Default:False
. - active (
bool
, optional) – Enable the data rule. Default:False
.
-
active
¶ Returns if the rule is active or not.
Returns: A bool
.
- stream (
History¶
-
class
streamsets.sdk.st_models.
History
(history)[source]¶ Pipeline history.
Parameters: history ( dict
) – Python object representation of the pipeline history.-
entries
list
– A list ofstreamsets.sdk.st_models.HistoryEntry
instances.
-
latest
¶ Get pipeline history’s latest entry.
Returns: The most recent pipeline history entry as an instance of streamsets.sdk.st_models.HistoryEntry
.
-
-
class
streamsets.sdk.st_models.
HistoryEntry
(entry)[source]¶ Pipeline history entry.
Parameters: entry ( dict
) – Python object representation of the history entry.-
metrics
¶ Get pipeline history entry’s metrics.
Returns: The pipeline history entry’s metrics as an instance of streamsets.sdk.st_models.Metrics
.
-
Issues¶
-
class
streamsets.sdk.st_models.
Issue
(issue)[source]¶ Issue encountered for a pipeline or a stage.
Parameters: issue ( dict
) – Python object representation of the issue.
-
class
streamsets.sdk.st_models.
Issues
(issues)[source]¶ Issues encountered for pipelines as well as stages.
Parameters: issues ( dict
) – Python object representation of the issues.-
issues_count
int
– The number of issues.
-
pipeline_issues
list
– A list ofstreamsets.sdk.st_models.Issue
instances.
-
stage_issues
dict
– A dictionary mapping stage names to instances ofstreamsets.sdk.st_models.Issue
.
-
Logs¶
-
class
streamsets.sdk.st_models.
Log
(log)[source]¶ Model for ST logs.
Parameters: log ( list
) – A list of dictionaries (JSON representation) of the log.
Metrics¶
-
class
streamsets.sdk.st_models.
MetricCounter
(counter)[source]¶ Metric counter.
Parameters: counter ( dict
) – Python object representation of a metric counter.-
count
¶ Get the metric counter’s count.
Returns: The metric counter’s count as an int
.
-
-
class
streamsets.sdk.st_models.
MetricGauge
(gauge)[source]¶ Metric gauge.
Parameters: gauge ( dict
) – Python object representation of a metric gauge.-
value
¶ Get the metric gauge’s value.
Returns: The metric gauge’s value as a str
.
-
-
class
streamsets.sdk.st_models.
MetricHistogram
(histogram)[source]¶ Metric histogram.
Parameters: histogram ( dict
) – Python object representation of a metric histogram.
-
class
streamsets.sdk.st_models.
MetricTimer
(timer)[source]¶ Metric timer.
Parameters: timer ( dict
) – Python object representation of a metric timer.-
count
¶ Get the metric timer’s count.
Returns: The metric timer’s count as an int
.
-
-
class
streamsets.sdk.st_models.
Metrics
(metrics)[source]¶ Metrics.
Parameters: metrics ( dict
) – Python object representation of metrics.-
counter
(name)[source]¶ Get the metric counter from metrics.
Parameters: name ( str
) – Counter name.Returns: The metric counter as an instance of streamsets.sdk.st_models.MetricCounter
.
-
gauge
(name)[source]¶ Get the metric gauge from metrics.
Parameters: name ( str
) – Gauge name.Returns: The metric gauge as an instance of streamsets.sdk.st_models.MetricGauge
.
-
histogram
(name)[source]¶ Get the metric histogram from metrics.
Parameters: name ( str
) – Histogram name.Returns: The metric histogram as an instance of streamsets.sdk.st_models.MetricHistogram
.
-
timer
(name)[source]¶ Get the metric timer from metrics.
Parameters: name ( str
) – Timer namer.Returns: The metric timer as an instance of streamsets.sdk.st_models.MetricTimer
.
-
Pipelines¶
-
class
streamsets.sdk.st_models.
PipelineBuilder
(pipeline, definitions)[source]¶ Class with which to build ST pipelines.
This class allows a user to programmatically generate an ST pipeline. Instead of instantiating this class directly, most users should use
streamsets.sdk.Transformer.get_pipeline_builder()
.Parameters: - pipeline (
dict
) – Python object representing an empty pipeline. If created manually, this would come from creating a new pipeline in ST and then exporting it before doing any configuration. - definitions (
dict
) – The output of ST’s definitions endpoint.
-
add_data_drift_rule
(*data_drift_rules)[source]¶ Add one or more data drift rules to the pipeline.
Parameters: *data_drift_rules – One or more instances of streamsets.sdk.st_models.DataDriftRule
.
-
add_data_rule
(*data_rules)[source]¶ Add one or more data rules to the pipeline.
Parameters: *data_rules – One or more instances of streamsets.sdk.st_models.DataRule
.
-
add_error_stage
(label=None, name=None, library=None)[source]¶ Add an error stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – ST stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – ST stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – ST stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.st_models.Stage
.- label (
-
add_metric_rule
(*metric_rules)[source]¶ Add one or more metric rules to the pipeline.
Parameters: *data_rules – One or more instances of streamsets.sdk.st_models.MetricRule
.
-
add_stage
(label=None, name=None, type=None, library=None)[source]¶ Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – ST stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – ST stage name to use when selecting stage from definitions. Default:None
. - type (
str
, optional) – ST stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
. - library (
str
, optional) – ST stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.st_models.Stage
.- label (
-
add_start_event_stage
(label=None, name=None, library=None)[source]¶ Add start event stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – ST stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – ST stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – ST stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.st_models.Stage
.- label (
-
add_stats_aggregator_stage
(label=None, name=None, library=None)[source]¶ Add a stats aggregator stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – ST stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – ST stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – ST stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.st_models.Stage
.- label (
-
add_stop_event_stage
(label=None, name=None, library=None)[source]¶ Add stop event stage to the pipeline.
When specifying a stage, either
label
orname
must be used. Iflibrary
is omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – ST stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – ST stage name to use when selecting stage from definitions. Default:None
. - library (
str
, optional) – ST stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.st_models.Stage
.- label (
-
build
(title='Pipeline')[source]¶ Build the pipeline.
Parameters: title ( str
, optional) – Pipeline title to use. Default:'Pipeline'
.Returns: An instance of streamsets.sdk.st_models.Pipeline
.
-
import_pipeline
(pipeline, **kwargs)[source]¶ Import a pipeline into the PipelineBuilder.
Parameters: pipeline ( dict
) – Exported pipeline.Returns: An instance of streamsets.sdk.st_models.PipelineBuilder
.
- pipeline (
-
class
streamsets.sdk.st_models.
Pipeline
(pipeline, all_stages=None)[source]¶ ST pipeline.
This class provides abstractions to make it easier to interact with a pipeline before it’s imported into ST.
Parameters: - pipeline (
dict
) – A Python object representing the serialized pipeline. - all_stages (
dict
, optional) – A dictionary mapping stage names tostreamsets.sdk.st_models.Stage
instances. Default:None
.
-
add_parameters
(**parameters)[source]¶ Add pipeline parameters.
Parameters: **parameters – Keyword arguments to add.
-
configuration
¶ Get pipeline’s configuration.
Returns: An instance of streamsets.sdk.models.Configuration
.
-
delivery_guarantee
¶ Get the delivery guarantee.
Returns: The delivery guarantee as a str
.
-
id
¶ Get the pipeline id.
Returns: The pipeline id as a str
.
-
metadata
¶ Get the pipeline metadata.
Returns: Pipeline metadata as a Python object.
-
origin_stage
¶ Get the pipeline’s origin stage.
Returns: An instance of streamsets.sdk.st_models.Stage
.
-
parameters
¶ Get the pipeline parameters.
Returns: A dict
of parameter key-value pairs.
-
rate_limit
¶ Get the rate limit (records/sec).
Returns: The rate limit as a str
.
-
title
¶ Get the pipeline title.
Returns: The pipeline title as a str
.
- pipeline (
-
class
streamsets.sdk.st_models.
Stage
(stage, label=None)[source]¶ Pipeline stage.
Parameters: - stage – JSON representation of the pipeline stage.
- label (
str
, optional) – Human-readable stage label. Default:None
.
-
configuration
streamsets.sdk.models.Configuration
– The stage configuration.
-
services
dict
– If supported by the stage, a dictionary mapping a service name to an instance ofstreamsets.sdk.models.Configuration
.
-
add_output
(*other_stages, event_lane=False)[source]¶ Connect output of this stage to another stage.
The __rshift__ operator (>>) has been overloaded to invoke this method.
Parameters: other_stage ( streamsets.sdk.st_models.Stage
) – Stage object.Returns: This stage as an instance of streamsets.sdk.st_models.Stage
).
-
event_lanes
¶ Get the stage’s list of event lanes.
Returns: A list
of event lanes.
-
label
¶ str
– The stage’s label.
-
library
¶ Get the stage’s library.
Returns: The stage library as a str
.
-
output_lanes
¶ Get the stage’s list of output lanes.
Returns: A list
of output lanes.
-
set_attributes
(**attributes)[source]¶ Set one or more stage attributes.
Parameters: **attributes – Attributes to set. Returns: This stage as an instance of streamsets.sdk.st_models.Stage
.
-
stage_on_record_error
¶ The stage’s on record error configuration value.
-
stage_record_preconditions
¶ The stage’s record preconditions configuration value.
-
stage_required_fields
¶ The stage’s required fields configuration value.
Pipeline ACLs¶
-
class
streamsets.sdk.st_models.
PipelineAcl
(pipeline_acl)[source]¶ Represents a pipeline ACL.
Parameters: pipeline_acl ( dict
) – JSON representation of a pipeline ACL.-
permissions
streamsets.sdk.st_models.PipelinePermissions
– Pipeline Permissions object.
-
Pipeline Permissions¶
-
class
streamsets.sdk.st_models.
PipelinePermission
(pipeline_permission)[source]¶ A container for a pipeline permission.
Parameters: pipeline_permission ( dict
) – A Python object representation of a pipeline permission.
-
class
streamsets.sdk.st_models.
PipelinePermissions
(pipeline_permissions)[source]¶ Container for list of permissions for a pipeline.
Parameters: pipeline_permissions ( dict
) – A Python object representation of pipeline permissions.-
permissions
list
– A list ofstreamsets.sdk.st_models.PipelinePermission
instances.
-
Previews¶
-
class
streamsets.sdk.st_models.
Preview
(pipeline_id, previewer_id, preview)[source]¶ Preview.
Parameters: - pipeline_id (
str
) – Pipeline ID. - previewer_id (
str
) – Previewer ID. - preview (
dict
) – Python object representation of the preview.
-
issues
dict
– An instance ofstreamsets.sdk.st_models.Issues
.
-
preview_batches
list
– A list ofstreamsets.sdk.st_models.Batch
instances.
- pipeline_id (
Snapshots¶
-
class
streamsets.sdk.st_models.
Batch
(batch)[source]¶ Snapshot batch.
Parameters: batch – Python object representation of the snapshot batch.
-
class
streamsets.sdk.st_models.
Record
(record)[source]¶ Record.
Parameters: record ( dict
) – Python object representation of the record.-
header
dict
– An instance ofstreamsets.sdk.st_models.RecordHeader
.
-
value
dict
– Python object representation of the record value.
-
value2
A typed representation of the record value.
-
-
class
streamsets.sdk.st_models.
RecordHeader
(header)[source]¶ Record Header.
Parameters: header ( dict
) – Python object representation of the record header.
-
class
streamsets.sdk.st_models.
Snapshot
(pipeline_id, snapshot_name, snapshot)[source]¶ Snapshot.
Parameters: - pipeline_id (
str
) – The pipeline ID. - snapshot_name (
str
) – The snapshot name. - snapshot (
dict
) – Python object representation of the snapshot.
-
snapshot_batches
list
– A list ofstreamsets.sdk.st_models.Batch
instances.
- pipeline_id (
-
class
streamsets.sdk.st_models.
StageOutput
(stage_output)[source]¶ Snapshot batch’s stage output.
Parameters: stage_output – Python object representation of the stage output. -
output
¶ Gets the stage output’s output.
If the stage contains multiple lanes, use
streamsets.sdk.st_models.StageOutput.output_lanes
.Raises: An instance of Exception
if the stage contains multiple lanes.Returns: An instance of streamsets.sdk.st_models.Record
.
-
StreamSets Control Hub¶
Main interface¶
This is the main entry point used by users when interacting with SCH instances.
-
class
streamsets.sdk.
ControlHub
(server_url, username, password)[source]¶ Class to interact with StreamSets Control Hub.
Parameters: - server_url (
str
) – SCH server base URL. - username (
str
) – SCH username. - password (
str
) – SCH password.
-
acknowledge_deployment_error
(*deployments)[source]¶ Acknowledge errors for one or more deployments.
Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.Deployment
.
-
acknowledge_event_subscription_error
(subscription)[source]¶ Acknowledge an error on given Event Subscription.
Parameters: subscription ( streamsets.sdk.sch_models.Subscription
) – A Subscription instance.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
acknowledge_job_error
(*jobs)[source]¶ Acknowledge errors for one or more jobs.
Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job
.
-
acknowledge_topology_errors
(topology)[source]¶ Acknowledge errors of a topology.
Parameters: topology ( streamsets.sdk.sch_models.Topology
) – Topology object.
-
action_audits
¶ Action Audits.
Returns: An instance of streamsets.sdk.sch_models.ActionAudits
.
-
activate_datacollector
(data_collector)[source]¶ Activate data collector.
Parameters: data_collector ( streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
-
activate_provisioning_agent
(provisioning_agent)[source]¶ Activate provisioning agent.
Parameters: provisioning_agent ( streamets.sdk.sch_models.ProvisioningAgent
) –Returns: An instance of streamsets.sdk.sch_api.Command
.
-
add_classification_rule
(classification_rule, commit=False)[source]¶ Add a classification rule.
Parameters: - classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object. - commit (
bool
, optional) – Whether to commit the rule after adding it. Default:False
.
- classification_rule (
-
add_connection
(connection)[source]¶ Add a connection.
Parameters: connection ( streamsets.sdk.sch_models.Connection
) – Connection object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
add_deployment
(deployment)[source]¶ Add a deployment.
Parameters: deployment ( streamsets.sdk.sch_models.Deployment
) – Deployment object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
add_group
(group)[source]¶ Add a group.
Parameters: group ( streamsets.sdk.sch_models.Group
) – Group object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
add_job
(job)[source]¶ Add a job.
Parameters: job ( streamsets.sdk.sch_models.Job
) – Job object.Returns: An instance of streamsets.sdk.sch_models.Job
.
-
add_organization
(organization)[source]¶ Add an organization.
Parameters: organization ( streamsets.sdk.sch_models.Organization
) – Organization object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
add_protection_policy
(protection_policy)[source]¶ Add a protection policy.
Parameters: protection_policy ( streamsets.sdk.sch_models.ProtectionPolicy
) – Protection Policy object.
-
add_report_definition
(report_definition)[source]¶ Add Report Definition to Control Hub.
Parameters: report_definition ( streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
add_subscription
(subscription)[source]¶ Add Subscription to Control Hub.
Parameters: subscription ( streamsets.sdk.sch_models.Subscription
) – A Subscription instance.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
add_user
(user)[source]¶ - Add a user. Some user attributes are updated by SCH such as
- created_by, created_on, last_modified_by, last_modified_on, password_expires_on, password_system_generated.
Parameters: user ( streamsets.sdk.sch_models.User
) – User object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
balance_data_collectors
(*data_collectors)[source]¶ Balance all jobs running on given Data Collectors.
Parameters: *sdcs – One or more instances of streamsets.sdk.sch_models.DataCollector
.
-
balance_job
(*jobs)[source]¶ Balance one or more jobs.
Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job
.
-
connection_audits
¶ Connection Audits.
Returns: An instance of streamsets.sdk.sch_models.ConnectionAudits
.
Connection Tags.
Returns: An instance of streamsets.sdk.sch_models.ConnectionTags
.
-
connections
¶ Connections.
Returns: An instance of streamsets.sdk.sch_models.Connections
.
-
create_components
(component_type, number_of_components=1, active=True)[source]¶ Create components.
Parameters: - component_type (
str
) – Component type. - number_of_components (
int
, optional) – Default:1
. - active (
bool
, optional) – Default:True
.
Returns: An instance of
streamsets.sdk.sch_api.CreateComponentsCommand
.- component_type (
-
create_topology
(topology)[source]¶ Create a topology.
Parameters: topology ( streamsets.sdk.sch_models.Topology
) – Topology object.Returns: An instance of streamsets.sdk.sch_models.Topology
.
-
data_collectors
¶ Data Collectors registered to the Control Hub instance.
Returns: Returns a streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.DataCollector
instances.
-
data_protector_enabled
¶ bool
– Whether Data Protector is enabled for the current organization.
-
data_protector_version
¶ str
– Returns the StreamSets Data Protector version string configured in the system data collector. If data protector is not enabled a None value is returned
-
deactivate_datacollector
(data_collector)[source]¶ Deactivate data collector.
Parameters: data_collector ( streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
-
deactivate_provisioning_agent
(provisioning_agent)[source]¶ Deactivate provisioning agent.
Parameters: provisioning_agent ( streamets.sdk.sch_models.ProvisioningAgent
) –Returns: An instance of streamsets.sdk.sch_api.Command
.
-
deactivate_user
(*users, organization=None)[source]¶ Deactivate Users for all given User IDs.
Parameters: - *users – One or more instances of
streamsets.sdk.sch_models.User
. - organization (
str
, optional) – Default:None
. If not specified, current organization will be used.
Returns: An instance of
streamsets.sdk.sch_api.Command
.- *users – One or more instances of
-
delete_and_unregister_data_collector
(data_collector)[source]¶ Delete and Unregister data collector.
Parameters: data_collector ( streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
-
delete_connection
(*connections)[source]¶ Delete connections.
Parameters: *connections – One or more instances of streamsets.sdk.sch_models.Connection
.
-
delete_data_collector
(data_collector)[source]¶ Delete data collector.
Parameters: data_collector ( streamsets.sdk.sch_models.DataCollector
) – Data Collector object.
-
delete_deployment
(*deployments)[source]¶ Delete deployments.
Parameters: *deployments – One or more instances of streamsets.sdk.sch_models.Deployment
.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
delete_group
(*groups)[source]¶ Delete groups.
Parameters: *groups – One or more instances of streamsets.sdk.sch_models.Group
.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
delete_job
(*jobs)[source]¶ Delete one or more jobs.
Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job
.
-
delete_pipeline
(pipeline, only_selected_version=False)[source]¶ Delete a pipeline.
Parameters: - pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object. - only_selected_version (
boolean
) – Delete only current commit.
Returns: An instance of
streamsets.sdk.sch_api.Command
.- pipeline (
-
delete_pipeline_labels
(*pipeline_labels)[source]¶ Delete pipeline labels.
Parameters: *pipeline_labels – One or more instances of streamsets.sdk.sch_models.PipelineLabel
.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
delete_provisioning_agent
(provisioning_agent)[source]¶ Delete provisioning agent.
Parameters: provisioning_agent ( streamets.sdk.sch_models.ProvisioningAgent
) –Returns: An instance of streamsets.sdk.sch_api.Command
.
-
delete_provisioning_agent_token
(provisioning_agent)[source]¶ Delete provisioning agent token.
Parameters: provisioning_agent ( streamets.sdk.sch_models.ProvisioningAgent
) –Returns: An instance of streamsets.sdk.sch_api.Command
.
-
delete_report_definition
(report_definition)[source]¶ Delete an existing Report Definition.
Parameters: report_definition ( streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
delete_subscription
(subscription)[source]¶ Delete an exisiting Subscription.
Parameters: subscription ( streamsets.sdk.sch_models.Subscription
) – A Subscription instance.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
delete_topology
(topology, only_selected_version=False)[source]¶ Delete a topology.
Parameters: - topology (
streamsets.sdk.sch_models.Topology
) – Topology object. - only_selected_version (
boolean
) – Delete only current commit.
Returns: An instance of
streamsets.sdk.sch_api.Command
.- topology (
-
delete_user
(*users, deactivate=False)[source]¶ Delete users. Deactivate users before deleting if configured.
Parameters: - *users – One or more instances of
streamsets.sdk.sch_models.User
. - deactivate (
bool
, optional) – Default:False
.
Returns: An instance of
streamsets.sdk.sch_api.Command
.- *users – One or more instances of
-
deployments
¶ Deployments.
Returns: An instance of streamsets.sdk.sch_models.Deployments
.
-
duplicate_job
(job, name=None, description=None, number_of_copies=1)[source]¶ Duplicate an existing job.
Parameters: - job (
streamsets.sdk.sch_models.Job
) – Job object. - name (
str
, optional) – Name of the new job(s). Default:None
. If not specified, name of the job with' copy'
appended to the end will be used. - description (
str
, optional) – Description for new job(s). Default:None
. - number_of_copies (
int
, optional) – Number of copies. Default:1
.
Returns: A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.- job (
-
duplicate_pipeline
(pipeline, name=None, description='New Pipeline', number_of_copies=1)[source]¶ Duplicate an existing pipeline.
Parameters: - pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object. - name (
str
, optional) – Name of the new pipeline(s). Default:None
. - description (
str
, optional) – Description for new pipeline(s). Default:'New Pipeline'
. - number_of_copies (
int
, optional) – Number of copies. Default:1
.
Returns: A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.- pipeline (
-
edit_job
(job)[source]¶ Edit a job.
Parameters: job ( streamsets.sdk.sch_models.Job
) – Job object.Returns: An instance of streamsets.sdk.sch_models.Job
.
-
edit_topology
(topology)[source]¶ Edit a topology.
Parameters: topology ( streamsets.sdk.sch_models.Topology
) – Topology object.Returns: An instance of streamsets.sdk.sch_models.Topology
.
-
export_jobs
(jobs)[source]¶ Export jobs to a compressed archive.
Parameters: jobs ( list
) – A list ofstreamsets.sdk.sch_models.Job
instances.Returns: An instance of type bytes
indicating the content of zip file with job json files.
-
export_pipelines
(pipelines, fragments=False, include_plain_text_credentials=False)[source]¶ Export pipelines.
Parameters: - pipelines (
list
) – A list ofstreamsets.sdk.sch_models.Pipeline
instances. - fragments (
bool
) – Indicates if exporting fragments is needed. - include_plain_text_credentials (
bool
) – Indicates if plain text credentials should be included.
Returns: An instance of type
bytes
indicating the content of zip file with pipeline json files.- pipelines (
-
export_protection_policies
(protection_policies)[source]¶ Export protection policies to a compressed archive.
Parameters: - protection_policies (
list
) – A list ofstreamsets.sdk.sch_models.ProtectionPolicy
- instances. –
Returns: An instance of type
bytes
indicating the content of zip file with protection policy json files.- protection_policies (
-
export_topologies
(topologies)[source]¶ Export topologies.
Parameters: topologies ( list
) – A list ofstreamsets.sdk.sch_models.Topology
instances.Returns: An instance of type bytes
indicating the content of zip file with pipeline json files.
-
get_admin_tool
(base_url, username, password)[source]¶ Get SCH admin tool.
Returns: An instance of streamsets.sdk.sch_models.AdminTool
.
-
get_classification_rule_builder
()[source]¶ Get a classification rule builder instance with which a classification rule can be created.
Returns: An instance of streamsets.sdk.sch_models.ClassificationRuleBuilder
.
-
get_components
(component_type_id, offset=None, len_=None, order_by='LAST_VALIDATED_ON', order='ASC')[source]¶ Get components.
Parameters: - component_type_id (
str
) – Component type id. - offset (
str
, optional) – Default:None
. - len (
str
, optional) – Default:None
. - order_by (
str
, optional) – Default:'LAST_VALIDATED_ON'
. - order (
str
, optional) – Default:'ASC'
.
- component_type_id (
-
get_connection_builder
()[source]¶ Get a connection builder instance with which a connection can be created.
Returns: An instance of streamsets.sdk.sch_models.ConnectionBuilder
.
-
get_current_job_status
(job)[source]¶ Returns the current job status for given job id.
Parameters: job ( streamsets.sdk.sch_models.Job
) – Job object.
-
get_data_collector_labels
(data_collector)[source]¶ Returns all labels assigned to data collector.
Parameters: data_collector ( streamsets.sdk.sch_models.DataCollector
) – Data Collector object.Returns: A list
of data collector assigned labels.
-
get_deployment_builder
()[source]¶ Get a deployment builder instance with which a deployment can be created.
Returns: An instance of streamsets.sdk.sch_models.DeploymentBuilder
.
-
get_group_builder
()[source]¶ Get a group builder instance with which a group can be created.
Returns: An instance of streamsets.sdk.sch_models.GroupBuilder
.
-
get_job_builder
()[source]¶ Get a job builder instance with which a job can be created.
Returns: An instance of streamsets.sdk.sch_models.JobBuilder
.
-
get_organization_builder
()[source]¶ Get an organization builder instance with which an organization can be created.
Returns: An instance of streamsets.sdk.sch_models.OrganizationBuilder
.
-
get_pipeline_builder
(data_collector=None, transformer=None, fragment=False)[source]¶ Get a pipeline builder instance with which a pipeline can be created.
Parameters: - data_collector (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to author the pipeline. If omitted, Control Hub’s system SDC will be used. Default:None
. - transformer (
streamsets.sdk.sch_models.Transformer
, optional) – The Transformer in which to author the pipeline. - fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
Returns: An instance of
streamsets.sdk.sch_models.PipelineBuilder
orstreamsets.sdk.sch_models.StPipelineBuilder
.- data_collector (
-
get_protection_method_builder
()[source]¶ Get a protection method builder instance with which a protection method can be created.
Returns: An instance of streamsets.sdk.sch_models.ProtectionMethodBuilder
.
-
get_protection_policy_builder
()[source]¶ Get a protection policy builder instance with which a protection policy can be created.
Returns: An instance of streamsets.sdk.sch_models.ProtectionPolicyBuilder
.
-
get_report_definition_builder
()[source]¶ Get a Report Definition Builder instance with which a Report Definition can be created.
Returns: An instance of streamsets.sdk.sch_models.ReportDefinitionBuilder
.
-
get_scheduled_task_builder
()[source]¶ Get a scheduled task builder instance with which a scheduled task can be created.
Returns: An instance of streamsets.sdk.sch_models.ScheduledTaskBuilder
.
-
get_subscription_builder
()[source]¶ Get Event Subscription Builder.
Returns: An instance of streamsets.sdk.sch_models.SubscriptionBuilder
.
-
get_topology_builder
()[source]¶ Get a topology builder instance with which a topology can be created.
Returns: An instance of streamsets.sdk.sch_models.TopologyBuilder
.
-
get_topology_jobs
(topology)[source]¶ Get jobs for given topology.
Parameters: topology ( streamsets.sdk.sch_models.Topology
) – Topology object.Returns: A list of streamsets.sdk.sch_models.Job
instances.
-
get_user_builder
()[source]¶ Get a user builder instance with which a user can be created.
Returns: An instance of streamsets.sdk.sch_models.UserBuilder
.
-
groups
¶ Groups.
Returns: An instance of streamsets.sdk.sch_models.Groups
.
-
import_jobs
(archive, pipeline=True, number_of_instances=False, labels=False, runtime_parameters=False, **kwargs)[source]¶ Import jobs from archived zip directory.
Parameters: - archive (
file
) – file containing the jobs. - pipeline (
boolean
, optional) – Indicate if pipeline should be imported. Default:True
. - number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
. - labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
. - runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
Returns: A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.- archive (
-
import_pipeline
(pipeline, commit_message, name=None, data_collector_instance=None)[source]¶ Import pipeline from json file.
Parameters: - pipeline (
dict
) – A python dict representation of ControlHub Pipeline. - commit_message (
str
) – Commit message. - name (
str
, optional) – Name of the pipeline. If left out, pipeline name from JSON object will be used. DefaultNone
. - data_collector_instance (
streamsets.sdk.sch_models.DataCollector
) – If excluded, system sdc will be used. DefaultNone
.
Returns: An instance of
streamsets.sdk.sch_models.Pipeline
.- pipeline (
-
import_pipelines_from_archive
(archive, commit_message, fragments=False)[source]¶ Import pipelines from archived zip directory.
Parameters: - archive (
file
) – file containing the pipelines. - commit_message (
str
) – Commit message. - fragments (
bool
, optional) – Indicates if pipeline contains fragments.
Returns: A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Pipeline
.- archive (
-
import_protection_policies
(policies_archive)[source]¶ Import protection policies from a compressed archive.
Parameters: policies_archive ( file
) – file containing the protection policies.Returns: A py:class:streamsets.sdk.utils.SeekableList of streamsets.sdk.sch_models.ProtectionPolicy
.
-
import_topologies
(archive, import_number_of_instances=False, import_labels=False, import_runtime_parameters=False, **kwargs)[source]¶ Import topologies from archived zip directory.
Parameters: - archive (
file
) – file containing the topologies. - import_number_of_instances (
boolean
, optional) – Indicate if number of instances should be imported. Default:False
. - import_labels (
boolean
, optional) – Indicate if labels should be imported. Default:False
. - import_runtime_parameters (
boolean
, optional) – Indicate if runtime parameters should be imported. Default:False
.
Returns: A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Topology
.- archive (
-
jobs
¶ Jobs.
Returns: An instance of streamsets.sdk.sch_models.Jobs
.
-
ldap_enabled
¶ Indication if LDAP is enabled or not.
Returns: An instance of boolean
.
-
login_audits
¶ Login Audits.
Returns: An instance of streamsets.sdk.sch_models.LoginAudits
.
-
organizations
¶ Organizations.
Returns: An instance of streamsets.sdk.sch_models.Organizations
.
-
pipeline_labels
¶ Pipeline labels.
Returns: An instance of streamsets.sdk.sch_models.PipelineLabels
.
-
pipelines
¶ Pipelines.
Returns: An instance of streamsets.sdk.sch_models.Pipelines
.
-
preview_classification_rule
(classification_rule, parameter_data, data_collector=None)[source]¶ Dynamic preview of a classification rule.
Parameters: - classification_rule (
streamsets.sdk.sch_models.ClassificationRule
) – Classification Rule object. - parameter_data (
dict
) – A python dict representation of raw JSON parameters required for preview. - data_collector (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, Control Hub’s first executor SDC will be used. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_api.PreviewCommand
.- classification_rule (
-
protection_policies
¶ Protection policies.
Returns: An instance of streamsets.sdk.sch_models.ProtectionPolicies
.
-
provisioning_agents
¶ Provisioning Agents registered to the Control Hub instance.
Returns: An instance of streamsets.sdk.sch_models.ProvisioningAgents
.
-
publish_pipeline
(pipeline, commit_message='New pipeline', draft=False)[source]¶ Publish a pipeline.
Parameters: - pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object. - commit_message (
str
, optional) – Default:'New pipeline'
. - draft (
boolean
, optional) – Default:False
.
- pipeline (
-
publish_scheduled_task
(task)[source]¶ Send the scheduled task to Control Hub.
Parameters: task ( streamsets.sdk.sch_models.ScheduledTask
) – Scheduled task object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
publish_topology
(topology)[source]¶ Public a topology.
Parameters: topology ( streamsets.sdk.sch_models.Topology
) – Topology object.Returns: An instance of streamsets.sdk.sch_models.Topology
.
-
report_definitions
¶ Report Definitions.
Returns: An instance of streamsets.sdk.sch_models.ReportDefinitions
.
-
reset_origin
(*jobs)[source]¶ Reset all pipeline offsets for given jobs.
Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job
.Returns: A streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
-
run_pipeline_preview
(pipeline, batches=1, batch_size=10, skip_targets=True, skip_lifecycle_events=True, timeout=120000, test_origin=False, read_policy=None, write_policy=None, executor=None, **kwargs)[source]¶ Run pipeline preview.
Parameters: - pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object. - batches (
int
, optional) – Number of batches. Default:1
. - batch_size (
int
, optional) – Batch size. Default:10
. - skip_targets (
bool
, optional) – Skip targets. Default:True
. - skip_lifecycle_events (
bool
, optional) – Skip life cycle events. Default:True
. - timeout (
int
, optional) – Timeout. Default:120000
. - test_origin (
bool
, optional) – Test origin. Default:False
. - read_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Read Policy for preview. If not provided, uses default read policy if one available. Default:None
. - write_policy (
streamsets.sdk.sch_models.ProtectionPolicy
) – Write Policy for preview. If not provided, uses default write policy if one available. Default:None
. - executor (
streamsets.sdk.sch_models.DataCollector
, optional) – The Data Collector in which to preview the pipeline. If omitted, Control Hub’s first executor SDC will be used. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_api.PreviewCommand
.- pipeline (
-
scale_deployment
(deployment, num_instances)[source]¶ Scale up/down active deployment.
Parameters: - deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment object. - num_instances (
int
) – Number of sdc instances.
Returns: An instance of
streamsets.sdk.sch_api.Command
.- deployment (
-
scheduled_tasks
¶ Scheduled Tasks.
Returns: An instance of streamsets.sdk.sch_models.ScheduledTasks
.
-
set_default_read_protection_policy
(protection_policy)[source]¶ Set a default read protection policy.
Parameters: protection_policy – :param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default read policy.:Returns: An updated instance of streamsets.sdk.sch_models.ProtectionPolicy
.Raises: UnsupportedMethodError
– Only supported on Control Hub version 3.14.0+.
-
set_default_write_protection_policy
(protection_policy)[source]¶ Set a default write protection policy.
Parameters: protection_policy – :param (
streamsets.sdk.sch_models.ProtectionPolicy
): Protection :param Policy object to be set as the default write policy.:Returns: An updated instance of streamsets.sdk.sch_models.ProtectionPolicy
.Raises: UnsupportedMethodError
– Only supported on Control Hub version 3.14.0+.
-
set_user
(username, password)[source]¶ Set the user by which subsequent actions will be run.
Parameters: - username (
str
) – SCH username. - password (
str
) – SCH password.
Returns: An instance of
streamsets.sdk.sch_api.Command
.- username (
-
start_all_topology_jobs
(topology)[source]¶ Start all jobs of a topology.
Parameters: topology ( streamsets.sdk.sch_models.Topology
) – Topology object.
-
start_deployment
(deployment, **kwargs)[source]¶ Start Deployment.
Parameters: - deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment instance. - wait (
bool
, optional) – Wait for deployment to start. Default:True
. - wait_for_statuses (
list
, optional) – Deployment statuses to wait on. Default:['ACTIVE']
.
Returns: An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.- deployment (
-
start_job
(*jobs, wait=True, **kwargs)[source]¶ Start one or more jobs.
Parameters: - *jobs – One or more instances of
streamsets.sdk.sch_models.Job
. - wait (
bool
, optional) – Wait for pipelines to reach RUNNING status before returning. Default:True
. - **kwargs – Arbitrary keyword arguments.
Returns: An instance of
streamsets.sdk.sch_api.StartJobsCommand
.- *jobs – One or more instances of
-
start_job_template
(job_template, instance_name_suffix='COUNTER', parameter_name=None, runtime_parameters=None, number_of_instances=1, wait_for_data_collectors=False)[source]¶ Start Job instances from a Job Template.
Parameters: - job_template (
streamsets.sdk.sch_models.Job
) – A Job instance with the property job_template set toTrue
. - instance_name_suffix (
str
, optional) – Suffix to be used for Job names in {‘COUNTER’, ‘TIME_STAMP’, ‘PARAM_VALUE’}. Default:COUNTER
. - parameter_name (
str
, optional) – Specified when instance_name_suffix is ‘PARAM_VALUE’. Default:None
. - runtime_parameters (
dict
) or (list
) – Runtime Parameters to be used in the jobs. If a dict is specified,number_of_instances
jobs will be started. If a list is specified,number_of_instances
is ignored and job instances will be started using the elements of the list as Runtime Parameters for each job. If left out, Runtime Parameters from Job Template will be used. Default:None
. - number_of_instances (
int
, optional) – Number of instances to be started using given parameters. Default:1
. - wait_for_data_collectors (
bool
, optional) – Default:False
.
Returns: A
streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
instances.- job_template (
-
stop_all_topology_jobs
(topology, force=False)[source]¶ Stop all jobs of a topology.
Parameters: - topology (
streamsets.sdk.sch_models.Topology
) – Topology object. - force (
bool
, optional) – Force topology jobs to stop. Default:False
.
- topology (
-
stop_deployment
(deployment, wait_for_statuses=['INACTIVE'])[source]¶ Stop Deployment.
Parameters: - deployment (
streamsets.sdk.sch_models.Deployment
) – Deployment instance. - wait_for_statuses (
list
, optional) – List of statuses to wait for. Default:['INACTIVE']
.
Returns: An instance of
streamsets.sdk.sch_api.DeploymentStartStopCommand
.- deployment (
-
stop_job
(*jobs, force=False, timeout_sec=300)[source]¶ Stop one or more jobs.
Parameters: - *jobs – One or more instances of
streamsets.sdk.sch_models.Job
. - force (
bool
, optional) – Force job to stop. Default:False
. - timeout_sec (
int
, optional) – Timeout in secs. Default:300
.
- *jobs – One or more instances of
-
stop_test_pipeline_run
(start_pipeline_command)[source]¶ Stop the test run of pipeline.
Parameters: start_pipeline_command ( streamsets.sdk.sdc_api.StartPipelineCommand
) –Returns: An instance of streamsets.sdk.sdc_api.StopPipelineCommand
.
-
subscriptions
¶ Event Subscriptions.
Returns: An instance of streamsets.sdk.sch_models.Subscriptions
.
-
sync_job
(*jobs)[source]¶ Sync one or more jobs.
Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job
.
-
test_pipeline_run
(pipeline, reset_origin=False, parameters=None)[source]¶ Test run a pipeline.
Parameters: - pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object. - reset_origin (
boolean
, optional) – Default:False
. - parameters (
dict
, optional) – Pipeline parameters. Default:None
.
Returns: An instance of
streamsets.sdk.sdc_api.StartPipelineCommand
.- pipeline (
-
topologies
¶ Topologies.
Returns: An instance of streamsets.sdk.sch_models.Topologies
.
-
transformers
¶ Transformers registered to the Control Hub instance.
Returns: Returns a streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Transformer
instances.
-
update_connection
(connection)[source]¶ Update a connection.
Parameters: connection ( streamsets.sdk.sch_models.Connection
) – Connection object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
update_data_collector_labels
(data_collector)[source]¶ Update data collector labels.
Parameters: data_collector ( streamsets.sdk.sch_models.DataCollector
) – Data Collector object.Returns: An instance of streamsets.sdk.sch_models.DataCollector
.
-
update_data_collector_resource_thresholds
(data_collector, max_cpu_load=None, max_memory_used=None, max_pipelines_running=None)[source]¶ Updates data collector resource thresholds.
Parameters: - data_collector (
streamsets.sdk.sch_models.DataCollector
) – Data Collector object. - max_cpu_load (
float
, optional) – Max CPU load in percentage. Default:None
. - max_memory_used (
int
, optional) – Max memory used in MB. Default:None
. - max_pipelines_running (
int
, optional) – Max pipelines running. Default:None
.
Returns: An instance of
streamsets.sdk.sch_api.Command
.- data_collector (
-
update_deployment
(deployment)[source]¶ Update a deployment.
Parameters: deployment ( streamsets.sdk.sch_models.Deployment
) – Deployment object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
update_group
(group)[source]¶ Update a group.
Parameters: group ( streamsets.sdk.sch_models.Group
) – Group object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
update_job
(job)[source]¶ Update a job.
Parameters: job ( streamsets.sdk.sch_models.Job
) – Job object.Returns: An instance of streamsets.sdk.sch_models.Job
.
-
update_pipelines_with_different_fragment_version
(pipelines, from_fragment_version, to_fragment_version)[source]¶ Update pipelines with latest pipeline fragment commit version.
Parameters: - pipelines (
list
) – List ofstreamsets.sdk.sch_models.Pipeline
instances. - from_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment from which the pipeline needs to be updated. - to_fragment_version (
streamsets.sdk.sch_models.PipelineCommit
) – commit of fragment to which the pipeline needs to be updated.
Returns: An instance of
streamsets.sdk.sch_api.Command
- pipelines (
-
update_report_definition
(report_definition)[source]¶ Update an existing Report Definition.
Parameters: report_definition ( streamsets.sdk.sch_models.ReportDefinition
) – Report Definition instance.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
update_subscription
(subscription)[source]¶ Update an existing Subscription.
Parameters: subscription ( streamsets.sdk.sch_models.Subscription
) – A Subscription instance.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
update_user
(user)[source]¶ - Update a user. Some user attributes are updated by SCH such as
- last_modified_by, last_modified_on.
Parameters: user ( streamsets.sdk.sch_models.User
) – User object.Returns: An instance of streamsets.sdk.sch_api.Command
.
-
upgrade_job
(*jobs)[source]¶ Upgrade job(s) to latest pipeline version.
Parameters: *jobs – One or more instances of streamsets.sdk.sch_models.Job
.Returns: A streamsets.sdk.utils.SeekableList
ofstreamsets.sdk.sch_models.Job
.
-
upload_offset
(job, offset_file=None, offset_json=None)[source]¶ Upload offset for given job.
Parameters: - job (
streamsets.sdk.sch_models.Job
) – Job object. - offset_file (
file
, optional) – File containing the offsets. Default:None
. Exactly one ofoffset_file
,offset_json
should specified. - offset_json (
dict
, optional) – Contents of offset. Default:None
. Exactly one ofoffset_file
,offset_json
should specified.
Returns: An instance of
streamsets.sdk.sch_models.Job
.- job (
-
users
¶ Users.
Returns: An instance of streamsets.sdk.sch_models.Users
.
-
verify_connection
(connection)[source]¶ Verify connection.
Parameters: connection ( streamsets.sdk.sch_models.Connection
) – Connection object.Returns: An instance of streamsets.sdk.sch_models.ConnectionVerificationResult
.
-
wait_for_job_status
(job, status, timeout_sec=200)[source]¶ Block until a job reaches the desired status.
Parameters: - job (
streamsets.sdk.sch_models.Job
) – The job instance. - status (
str
) – The desired status to wait for. - timeout_sec (
int
, optional) – Timeout to wait forjob
to reachstatus
, in seconds. Default:streamsets.sdk.sch.DEFAULT_WAIT_FOR_STATUS_TIMEOUT
.
Raises: TimeoutError
– Iftimeout_sec
passes withoutjob
reachingstatus
.- job (
- server_url (
Models¶
These models wrap and provide useful functionality for interacting with common SCH abstractions.
ACLs¶
-
class
streamsets.sdk.sch_models.
ACL
(acl, control_hub)[source]¶ Represents an ACL.
Parameters: - acl (
dict
) – JSON representation of an ACL. - control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
-
permissions
¶ streamsets.sdk.sch_models.Permissions
– A Collection of Permissions.
-
add_permission
(permission)[source]¶ Add new permission to the ACL.
Parameters: permission ( streamsets.sdk.sch_models.Permission
) – A permission object.Returns: An instance of streamsets.sdk.sch_api.Command
-
permission_builder
¶ Get a permission builder instance with which a pipeline can be created.
Returns: An instance of streamsets.sdk.sch_models.ACLPermissionBuilder
.
-
remove_permission
(permission)[source]¶ Remove a permission from ACL.
Parameters: permission ( streamsets.sdk.sch_models.Permission
) – A permission object.Returns: An instance of streamsets.sdk.sch_api.Command
- acl (
-
class
streamsets.sdk.sch_models.
ACLPermissionBuilder
(permission, acl)[source]¶ Class to help build the ACL permission.
Parameters: - permission (
streamsets.sdk.sch_models.Permission
) – A permission object. - acl (
streamsets.sdk.sch_models.ACL
) – An ACL object.
-
build
(subject_id, subject_type, actions)[source]¶ Method to help build the ACL permission.
Parameters: - subject_id (
str
) – Id of the subject e.g. ‘test@test’. - subject_type (
str
) – Type of the subject e.g. ‘USER’. - actions (
list
) – A list of actions of typestr
e.g. [‘READ’, ‘WRITE’, ‘EXECUTE’].
Returns: An instance of
streamsets.sdk.sch_models.Permission
.- subject_id (
- permission (
-
class
streamsets.sdk.sch_models.
Permission
(permission, resource_type, api_client)[source]¶ A container for a permission.
Parameters: - permission (
dict
) – A Python object representation of a permission. - resource_type (
str
) – String representing the type of resource e.g. ‘JOB’, ‘PIPELINE’. - api_client (
streamsets.sdk.sch_api.ApiClient
) – An instance of ApiClient.
-
resource_id
¶ str
– Id of the resource e.g. Pipeline or Job.
-
subject_id
¶ str
– Id of the subject e.g. user id'admin@admin'
.
-
subject_type
¶ str
– Type of the subject e.g.'USER'
.
-
last_modified_by
¶ str
– User who last modified this permission e.g.'admin@admin'
.
-
last_modified_on
¶ int
– Timestamp at which this permission was last modified e.g.1550785079811
.
- permission (
Classifiers¶
Classification Rules¶
-
class
streamsets.sdk.sch_models.
ClassificationRule
(classification_rule, classifiers)[source]¶ Classification Rule Model.
Parameters: - classification_rule (
dict
) – A Python dict representation of classification rule. - classifiers (
list
) – A list ofstreamsets.sdk.sch_models.Classifier
instances.
- classification_rule (
-
class
streamsets.sdk.sch_models.
ClassificationRuleBuilder
(classification_rule, classifier)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.ClassificationRule
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_classification_rule_builder()
.
Parameters: - classification_rule (
dict
) – Python object defining a classification rule. - classifier (
dict
) – Python object defining a classifier.
-
add_classifier
(patterns=None, match_with=None, regular_expression_type='RE2/J', case_sensitive=False)[source]¶ Add classifier to the classification rule.
Parameters: - patterns (
list
, optional) – List of strings of patterns. Default:None
. - match_with (
str
, optional) – Default:None
. - regular_expression_type (
str
, optional) – Default:'RE2/J'
. - case_sensitive (
bool
, optional) – Default:False
.
Returns: An instance of
streamsets.sdk.sch_models.Classifier
.- patterns (
-
build
(name, category, score)[source]¶ Build the classification rule.
Parameters: - name (
str
) – Classification Rule name. - category (
str
) – Classification Rule category. - score (
float
) – Classification Rule score.
Returns: An instance of
streamsets.sdk.sch_models.ClassificationRule
.- name (
DataCollectors¶
-
class
streamsets.sdk.sch_models.
DataCollector
(data_collector, control_hub)[source]¶ Model for Data Collector.
-
execution_mode
¶ bool
–True
for Edge andFalse
for SDC.
-
id
str
– Data Collectort id.
-
labels
¶ list
– Labels for Data Collector.
-
last_validated_on
¶ str
– Last validated time for Data Collector.
-
reported_labels
¶ list
– Reported labels for Data Collector.
-
url
¶ str
– Data Collector’s url.
-
version
str
– Data Collector’s version.
-
accessible
¶ Returns a
bool
for whether the Data Collector instance is accessible.
-
acl
¶ Get DataCollector ACL.
Returns: An instance of streamsets.sdk.sch_models.ACL
.
-
attributes
¶ Returns a
dict
of Data Collector attributes.
-
pipelines_committed
¶ Control Hub Job IDs that are about to be started but have no corresponding pipeline status yet.
Returns: A list
of Job IDs (str
objects).
-
responding
¶ Returns a
bool
for whether the Data Collector instance is responding.
-
Transformers¶
-
class
streamsets.sdk.sch_models.
Transformer
(transformer, control_hub)[source]¶ Model for Transformer.
-
execution_mode
¶ str
-
id
str
– Transformer id.
-
labels
¶ list
– Labels for Transformer.
-
last_validated_on
¶ str
– Last validated time for Transformer.
-
reported_labels
¶ list
– Reported labels for Transformer.
-
url
¶ str
– Transformer’s url.
-
version
str
– Transformer’s version.
-
accessible
¶ Returns a
bool
for whether the Transformer instance is accessible.
-
acl
¶ Get Transformer ACL.
Returns: An instance of streamsets.sdk.sch_models.ACL
.
-
attributes
¶ Returns a
dict
of Transformer attributes.
-
Group¶
-
class
streamsets.sdk.sch_models.
Group
(group, roles)[source]¶ Model for Group.
Parameters: - group (
dict
) – A Python object representation of Group. - roles (
dict
) – A mapping of role IDs to role labels.
- group (
-
class
streamsets.sdk.sch_models.
Groups
(control_hub, roles, organization)[source]¶ Collection of
streamsets.sdk.sch_models.Group
instances.Parameters: - control_hub – An instance of
streamsets.sdk.sch.ControlHub
. - roles (
dict
) – A mapping of role IDs to role labels. - organization (
str
) – Organization ID.
- control_hub – An instance of
-
class
streamsets.sdk.sch_models.
GroupBuilder
(group, roles)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.Group
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_group_builder()
.
Parameters: - group (
dict
) – Python object built from our Swagger GroupJson definition. - roles (
dict
) – A mapping of role IDs to role labels.
-
build
(id, display_name, ldap_groups=None)[source]¶ Build the group.
Parameters: - id (
str
) – Group ID. - display_name (
str
) – Group display name. - ldap_groups (
list
) – List of LDAP groups (strings).
Returns: An instance of
streamsets.sdk.sch_models.Group
.- id (
Jobs¶
-
class
streamsets.sdk.sch_models.
Job
(job, control_hub=None)[source]¶ Model for Job.
-
commit_id
¶ str
– Pipeline commit id.
-
commit_label
¶ str
– Pipeline commit label.
-
created_by
¶ str
– User that created this job.
-
created_on
¶ int
– Time at which this job was created.
-
data_collector_labels
¶ list
– Labels of the data collectors.
-
description
¶ str
– Job description.
-
destroyer
¶ str
– Job destroyer.
-
enable_failover
¶ bool
– Flag that indicates if failover is enabled.
-
enable_time_series_analysis
¶ bool
– Flag that indicates if time series is enabled.
-
execution_mode
¶ bool
– True for Edge and False for SDC.
-
job_deleted
¶ bool
– Flag that indicates if this job is deleted.
-
job_id
¶ str
– Id of the job.
-
job_name
¶ str
– Name of the job.
-
last_modified_by
¶ str
– User that last modified this job.
-
last_modified_on
¶ int
– Time at which this job was last modified.
-
number_of_instances
¶ int
– Number of instances.
-
pipeline_force_stop_timeout
¶ int
– Timeout for Pipeline force stop.
-
pipeline_id
¶ str
– Id of the pipeline that is running the job.
-
pipeline_name
¶ str
– Name of the pipeline that is running the job.
-
pipeline_rule_id
¶ str
– Rule Id of the pipeline that is running the job.
-
read_policy
¶ streamsets.sdk.sch_models.ProtectionPolicy
– Read Policy of the job.
-
runtime_parameters
¶ str
– Run-time parameters of the job.
-
statistics_refresh_interval_in_millisecs
¶ int
– Refresh interval for statistics in milliseconds.
-
status
¶ string
– Status of the job.
-
write_policy
¶ streamsets.sdk.sch_models.ProtectionPolicy
– Write Policy of the job.
-
acl
¶ Get job ACL.
Returns: An instance of streamsets.sdk.sch_models.ACL
.
-
commit
¶ Get pipeline commit of the job.
Returns: An instance of streamsets.sdk.sch_models.PipelineCommit
.
-
committed_offsets
¶ Get the committed offsets for a given job id.
Returns: An instance of streamsets.sdk.sch_models.JobCommittedOffset
.
-
metrics
(metric_type='RECORD_COUNT', pipeline_version=None, sdc_id=None, time_filter_condition='LATEST', include_error_count=False)[source]¶ Get metrics for the job.
Parameters: - metric_type (
str
, optional) – metric_type in {‘RECORD_COUNT’, ‘RECORD_THROUGHPUT’}. Default:'RECORD_COUNT'
. - pipeline_version (
str
, optional) – Default:None
. If default, version is picked as the version of the current commit of the pipeline. - sdc_id (
str
, optional) – Default:None
. If default, SDC ID is picked as the Id of the SDC with the current job run. - time_filter_condition (
str
, optional) – Default: LATEST. - include_error_count (
boolean
, optional) – Default: False.
Returns: An instance of
streamsets.sdk.sch_models.JobMetrics
.- metric_type (
-
pipeline
¶ Get the pipeline object corresponding to this job.
-
system_job
¶ Get the sytem Job for this job if exists.
Returns: An instance of streamsets.sdk.sch_models.Job
.
-
tag
¶ Get pipeline tag of the job.
Returns: An instance of streamsets.sdk.sch_models.PipelineTag
.
Get the job tags.
Returns: A streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.Tag
.
-
time_series_metrics
(metric_type, time_filter_condition='LAST_5M', **kwargs)[source]¶ Get historic time series metrics for the job.
Parameters: - metric_type (
str
) – metric type in {‘Record Count Time Series’, ‘Record Throughput Time Series’, ‘Batch Throughput Time Series’, ‘Stage Batch Processing Timer seconds’}. - time_filter_condition (
str
, optional) – Default:'LAST_5M'
.
Returns: An instance of
streamsets.sdk.sch_models.JobTimeSeriesMetrics
.- metric_type (
-
-
class
streamsets.sdk.sch_models.
Jobs
(control_hub)[source]¶ Collection of
streamsets.sdk.sch_models.Job
instances.Parameters: control_hub ( streamsets.sdk.sch.ControlHub
) – Control Hub object.
-
class
streamsets.sdk.sch_models.
JobBuilder
(job, control_hub)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.Job
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_job_builder()
.
Parameters: job ( dict
) – Python object built from our Swagger JobJson definition.-
build
(job_name, pipeline, job_template=False, runtime_parameters=None, pipeline_commit=None, pipeline_tag=None, pipeline_commit_or_tag=None, tags=None)[source]¶ Build the job.
Parameters: - job_name (
str
) – Name of the job. - pipeline (
streamsets.sdk.sch_models.Pipeline
) – Pipeline object. - job_template (
boolean
, optional) – Indicate if it is a Job Template. Default:False
. - runtime_parameters (
dict
, optional) – Runtime Parameters for the Job or Job Template. Default:None
. - pipeline_commit (
streamsets.sdk.sch_models.PipelineCommit
) – Default: ``None`, which resolves to the latest pipeline commit. - pipeline_tag (
streamsets.sdk.sch_models.PipelineTag
) – Default: ``None`, which resolves to the latest pipeline tag. - pipeline_commit_or_tag (
str
, optional) – Default:None
, which resolves to the latest pipeline commit. - tags (
list
, optional) – Job tags. Default:None
.
Returns: An instance of
streamsets.sdk.sch_models.Job
.- job_name (
-
class
streamsets.sdk.sch_models.
JobMetrics
(metrics)[source]¶ Model for job metrics.
-
output_count
¶ dict
– Stage ID to output count map.
-
error_count
¶ dict
– Stage ID to error count map.
Parameters: metrics ( dict
) – Metrics counts in JSON format.-
-
class
streamsets.sdk.sch_models.
JobOffset
(offset)[source]¶ Model for offset.
Parameters: offset ( dict
) – Offset in JSON format.
-
class
streamsets.sdk.sch_models.
JobRunEvent
(event)[source]¶ Model for an event in a Job Run.
Parameters: event ( dict
) – Job Run Event in JSON format.
-
class
streamsets.sdk.sch_models.
JobStatus
(status, control_hub, **kwargs)[source]¶ Model for Job Status.
-
run_history
¶ streamsets.sdk.utils.SeekableList
– (streamsets.sdk.utils.JobRunHistoryEvent
): History of a particular job run.
-
offsets
¶ streamsets.sdk.utils.SeekableList
– (streamsets.sdk.utils.JobPipelineOffset
): Offsets after the job run.
Parameters: - status (
dict
) – Job status in JSON format. - control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object.
-
-
class
streamsets.sdk.sch_models.
JobTimeSeriesMetric
(metric, metric_type)[source]¶ Model for job metrics.
-
name
¶ str
– Name of measurement.
-
values
¶ list
– Timeseries data.
-
time_series
¶ dict
– Timeseries data with timestamp as key and metric value as value.
Parameters: metric ( dict
) – Metrics in JSON format.-
-
class
streamsets.sdk.sch_models.
JobTimeSeriesMetrics
(metrics, metric_type)[source]¶ Model for job metrics.
-
input_records
¶ streamsets.sdk.sch_models.JobTimeSeriesMetric
– Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
-
output_records
¶ streamsets.sdk.sch_models.JobTimeSeriesMetric
– Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
-
error_records
¶ streamsets.sdk.sch_models.JobTimeSeriesMetric
– Appears when queried for ‘Record Count Time Series’ or ‘Record Throughput Time Series’.
-
batch_counter
¶ streamsets.sdk.sch_models.JobTimeSeriesMetric
– Appears when queried for ‘Batch Throughput Time Series’.
-
batch_processing_timer
¶ streamsets.sdk.sch_models.JobTimeSeriesMetric
– Appears when queried for ‘Batch Processing Timer seconds’.
Parameters: metrics ( dict
) – Metrics in JSON format.-
-
class
streamsets.sdk.sch_models.
RuntimeParameters
(runtime_parameters, job)[source]¶ Wrapper for Control Hub job runtime parameters.
Parameters: - runtime_parameters (
str
) – Runtime parameter. - job (
streamsets.sdk.sch_models.Job
) – Job object.
- runtime_parameters (
Organizations¶
-
class
streamsets.sdk.sch_models.
Organization
(organization, organization_admin_user=None, api_client=None)[source]¶ Model for Organization.
Parameters: - organization (
str
) – Organization Id. - organization_admin_user (
str
, optional) – Default:None
. - api_client (
streamsets.sdk.sch_api.ApiClient
, optional) – Default:None
.
- organization (
-
class
streamsets.sdk.sch_models.
Organizations
(control_hub)[source]¶ Collection of
streamsets.sdk.sch_models.Organization
instances.
-
class
streamsets.sdk.sch_models.
OrganizationBuilder
(organization, organization_admin_user)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.Organization
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_organization_builder()
.
Parameters: organization ( dict
) – Python object built from our Swagger UserJson definition.-
build
(id, name, admin_user_id, admin_user_display_name, admin_user_email_address, admin_user_ldap_user_name=None)[source]¶ Build the organization.
Parameters: - id (
str
) – Organization ID. - name (
str
) – Organization name. - admin_user_id (
str
) – User Id of the admin of this organization. - admin_user_display_name (
str
) – User display name of admin of this organization. - admin_user_email_address (
str
) – User email address of admin of this organization. - admin_user_ldap_user_name (
str
, optional) – LDAP username. Default:None
.
Returns: An instance of
streamsets.sdk.sch_models.Organization
.- id (
Pipelines¶
-
class
streamsets.sdk.sch_models.
Pipeline
(pipeline, builder, pipeline_definition, rules_definition, control_hub=None, library_definitions=None)[source]¶ Model for Pipeline.
Parameters: - pipeline (
dict
) – Pipeline in JSON format. - builder (
streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object. - pipeline_definition (
dict
) – Pipeline Definition in JSON format. - rules_definition (
dict
) – Rules Definition in JSON format. - control_hub (
streamsets.sdk.sch.ControlHub
, optional) – ControlHub object. Default:None
. - library_definitions (
dict
, optional) – Library Definition in JSON format. Default:None
.
-
Labels
¶ Get the pipeline labels. This attribute will be deprecated in a future release. Please use labels instead.
Returns: A list
ofdict
-
acl
¶ Get pipeline ACL.
Returns: An instance of streamsets.sdk.sch_models.ACL
.
-
commits
¶ Get commits for this pipeline.
Returns: A streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineCommit
.
-
configuration
¶ Get pipeline’s configuration.
Returns: An instance of streamsets.sdk.sch_models.Configuration
.
-
labels
¶ Get the pipeline labels.
Returns: A streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineLabel
.
-
parameters
¶ Get the pipeline parameters.
Returns: A dict like, streamsets.sdk.sch_models.PipelineParameters
object of parameter key-value pairs.
Get tags for this pipeline.
Returns: A streamsets.sdk.utils.SeekableList
of instances ofstreamsets.sdk.sch_models.PipelineTag
.
- pipeline (
-
class
streamsets.sdk.sch_models.
Pipelines
(control_hub, organization)[source]¶ Collection of
streamsets.sdk.sch_models.Pipeline
instances.Parameters: - control_hub – An instance of
streamsets.sdk.sch.ControlHub
. - organization (
str
) – Organization Id.
- control_hub – An instance of
-
class
streamsets.sdk.sch_models.
PipelineBuilder
(pipeline, data_collector_pipeline_builder, control_hub=None, fragment=False)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
Parameters: - pipeline (
dict
) – Python object built from our Swagger PipelineJson definition. - data_collector_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Data Collector Pipeline Builder object. - control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
. - fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
-
add_stage
(label=None, name=None, type=None, library=None)[source]¶ Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – Stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – Stage name to use when selecting stage from definitions. Default:None
. - type (
str
, optional) – Stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
. - library (
str
, optional) – Stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sch_models.SchSdcStage
.- label (
-
class
streamsets.sdk.sch_models.
PipelineParameters
(pipeline)[source]¶ Parameters for pipelines.
Parameters: pipeline ( streamsets.sdk.sch_models.Pipeline
) – Pipeline Instance.
-
class
streamsets.sdk.sch_models.
StPipelineBuilder
(pipeline, transformer_pipeline_builder, control_hub=None, fragment=False)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.Pipeline
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_pipeline_builder()
.
Parameters: - pipeline (
dict
) – Python object built from our Swagger PipelineJson definition. - transformer_pipeline_builder (
streamsets.sdk.sdc_models.PipelineBuilder
) – Transformer Pipeline Builder object. - control_hub (
streamsets.sdk.sch.ControlHub
) – Default:None
. - fragment (
boolean
, optional) – Specify if a fragment builder. Default:False
.
-
add_stage
(label=None, name=None, type=None, library=None)[source]¶ Add a stage to the pipeline.
When specifying a stage, either
label
orname
must be used.type
andlibrary
may also be used to select a particular stage if ambiguities exist. Iftype
and/orlibrary
are omitted, the first stage definition matching the givenlabel
orname
will be used.Parameters: - label (
str
, optional) – Transformer stage label to use when selecting stage from definitions. Default:None
. - name (
str
, optional) – Transformer stage name to use when selecting stage from definitions. Default:None
. - type (
str
, optional) – Transformer stage type to use when selecting stage from definitions (e.g. origin, destination, processor, executor). Default:None
. - library (
str
, optional) – Transformer stage library to use when selecting stage from definitions. Default:None
.
Returns: An instance of
streamsets.sdk.sch_models.SchStStage
.- label (
Protection Methods¶
-
class
streamsets.sdk.sch_models.
ProtectionMethod
(stage)[source]¶ Protection Method Model.
Parameters: stage ( dict
) – JSON representation of a stage.
-
class
streamsets.sdk.sch_models.
ProtectionMethodBuilder
(pipeline_builder)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.ProtectionMethod
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_method_builder()
.
Parameters: pipeline_builder ( streamsets.sdk.sch_models.PipelineBuilder
) – Pipeline Builder object.
Protection Policies¶
-
class
streamsets.sdk.sch_models.
ProtectionPolicy
(protection_policy, procedures=None)[source]¶ Model for Protection Policy.
Parameters: - protection_policy (
dict
) – JSON representation of Protection Policy. - procedures (
list
) – A list ofstreamsets.sdk.sch_models.PolicyProcedure
instances, Default:None
.
- protection_policy (
-
class
streamsets.sdk.sch_models.
ProtectionPolicies
(control_hub)[source]¶ Collection of
streamsets.sdk.sch_models.ProtectionPolicy
instances.
-
class
streamsets.sdk.sch_models.
ProtectionPolicyBuilder
(control_hub, protection_policy, policy_procedure)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.ProtectionPolicy
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_protection_policy_builder()
.
Parameters: - protection_policy (
dict
) – Python object defining a protection policy. - policy_procedure (
dict
) – Python object defining a policy procedure.
ProvisioningAgents¶
-
class
streamsets.sdk.sch_models.
Deployment
(deployment, control_hub=None)[source]¶ Model for Deployment.
Parameters: - deployment (
dict
) – A Python object representation of Deployment. - control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- deployment (
-
class
streamsets.sdk.sch_models.
Deployments
(control_hub, organization)[source]¶ Collection of
streamsets.sdk.sch_models.Deployment
instances.Parameters: - control_hub – An instance of
streamsets.sdk.sch.ControlHub
. - organization (
str
) – Organization Id.
- control_hub – An instance of
-
class
streamsets.sdk.sch_models.
DeploymentBuilder
(deployment)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.Deployment
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_deployment_builder()
.
Parameters: deployment ( dict
) – Python object that represents Deployment JSON.-
build
(name, provisioning_agent, number_of_data_collector_instances, spec=None, description=None, data_collector_labels=None)[source]¶ Build the deployment.
Parameters: - name (
str
) – Deployment Name. - provisioning_agent (
streamsets.sdk.sch_models.ProvisioningAgent
) – Agent to use. - (obj (number_of_data_collector_instances) – int): Number of sdc instances.
- spec (
dict
, optional) – Deployment yaml in dictionary format. Will use default yaml used by ui if left out. - description (
str
, optional) – Default:None
. - data_collector_labels (
list
, optional) – Default:['all']
.
Returns: An instance of
streamsets.sdk.sch_models.Deployment
.- name (
-
class
streamsets.sdk.sch_models.
ProvisioningAgent
(provisioning_agent, control_hub)[source]¶ Model for Provisioning Agent.
Parameters: - provisioning_agent (
dict
) – A Python object representation of Provisioning Agent. - control_hub – An instance of
streamsets.sdk.sch.ControlHub
.
- provisioning_agent (
-
class
streamsets.sdk.sch_models.
ProvisioningAgents
(control_hub, organization)[source]¶ Collection of
streamsets.sdk.sch_models.ProvisioningAgent
instances.Parameters: - control_hub – An instance of
streamsets.sdk.sch.ControlHub
. - organization (
str
) – Organization Id.
- control_hub – An instance of
Reports¶
-
class
streamsets.sdk.sch_models.
GenerateReportCommand
(control_hub, report_defintion, response)[source]¶ Command to interact with the response from generate_report.
Parameters: - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance. - report_definition (
dict
) – JSON representation of Report Definition. - response (
dict
) – Api response from generating the report.
- control_hub (
-
class
streamsets.sdk.sch_models.
Report
(report, control_hub, report_definition_id)[source]¶ Model for Report.
Parameters: report ( dict
) – JSON representation of Report.
-
class
streamsets.sdk.sch_models.
Reports
(control_hub, report_definition_id)[source]¶ Collection of
streamsets.sdk.sch_models.Report
instances.Parameters: - control_hub (
streamsets.sdk.sch.ControlHub
) – Control Hub object. - report_definition_id (
str
) – Report Definition Id.
- control_hub (
-
class
streamsets.sdk.sch_models.
ReportDefinition
(report_definition, control_hub)[source]¶ Model for Report Definition.
Parameters: - report_definition (
dict
) – JSON representation of Report Definition. - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
-
acl
¶ Get Report Definition ACL.
Returns: An instance of streamsets.sdk.sch_models.ACL
.
-
generate_report
()[source]¶ Generate a Report for Report Definition.
Returns: An instance of streamsets.sdk.sch_models.Report
.
-
report_resources
¶ Get Report Resources of the Report Definition.
Returns: An instance of streamsets.sdk.sch_models.ReportResources
.
-
reports
¶ Get Reports of the Report Definition.
Returns: An instance of streamsets.sdk.sch_models.Reports
.
- report_definition (
-
class
streamsets.sdk.sch_models.
ReportDefinitions
(control_hub)[source]¶ Collection of
streamsets.sdk.sch_models.ReportDefinition
instances.
-
class
streamsets.sdk.sch_models.
ReportDefinitionBuilder
(report_definition, control_hub)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.ReportDefinition
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_report_definition_builder()
.
Parameters: - report_definition (
dict
) – JSON representation of Report Definition. - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
-
add_report_resource
(resource)[source]¶ Add a given resource to Report Definition resources.
Parameters: resource ( streamsets.sdk.sch_models.Job
) or (streamsets.sdk.sch_models.Topology
) –
-
build
(name, description=None)[source]¶ Build the report definition.
Parameters: - name (
str
) – Name of the Report Definition. - description (
str
, optional) – Description of the Report Definition. Default:None
.
Returns: An instance of
streamsets.sdk.sch_models.ReportDefinition
.- name (
-
import_report_definition
(report_definition)[source]¶ Import an existing Report Definition to update it.
Parameters: report_definition ( streamsets.sdk.sch_models.ReportDefinition
) – Report Definition object.
-
remove_report_resource
(resource)[source]¶ Remove a resource from Report Definition Resources.
Parameters: resource ( streamsets.sdk.sch_models.Job
) or (streamsets.sdk.sch_models.Topology
) –Returns: A resource of type dict
that is removed from Report Definition Resources.
-
class
streamsets.sdk.sch_models.
ReportResource
(report_resource)[source]¶ Model for Report Resource.
Parameters: report_resource ( dict
) – JSON representation of Report Resource.
-
class
streamsets.sdk.sch_models.
ReportResources
(report_resources, report_definition)[source]¶ Model for the collection of Report Resources.
Parameters: - report_resources (
list
) – List of Report Resources. - report_definition (
streamsets.sdk.sch_models.ReportDefinition
) – Report Definition object.
- report_resources (
Scheduler¶
-
class
streamsets.sdk.sch_models.
ScheduledTask
(task, control_hub=None)[source]¶ Model for Scheduled Task.
Parameters: - task (
dict
) – JSON representation of task. - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
-
audits
¶ Get Scheduled Task Audits.
Returns: A streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskAudit
.
-
runs
¶ Get Scheduled Task Runs.
Returns: A streamsets.sdk.utils.SeekableList
of inherited instances ofstreamsets.sdk.sch_models.ScheduledTaskRun
.
- task (
-
class
streamsets.sdk.sch_models.
ScheduledTaskAudit
(audit)[source]¶ Scheduled Task Audit.
Parameters: run ( dict
) – JSON representation of scheduled task audit.
-
class
streamsets.sdk.sch_models.
ScheduledTaskBaseModel
(data, attributes_to_ignore=None, attributes_to_remap=None, repr_metadata=None)[source]¶ Base Model for Scheduled Task related classes.
-
class
streamsets.sdk.sch_models.
ScheduledTaskBuilder
(job_selection_types, control_hub)[source]¶ Builder for Scheduled Task.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_scheduled_task_builder()
.
Parameters: - job_selection_types (
dict
) – JSON representation of job selection types. - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
-
build
(task_object, action='START', name=None, description=None, cron_expression='0 0 1/1 * ? *', time_zone='UTC', status='RUNNING', start_time=None, end_time=None, missed_execution_handling='IGNORE')[source]¶ Builder for Scheduled Task.
Parameters: - task_object (
streamsets.sdk.sch_models.Job
) – (streamsets.sdk.sch_models.ReportDefinition
): Job or ReportDefinition object. - action (
str
, optional) – One of the {‘START’, ‘STOP’, ‘UPGRADE’} actions. Default:START
. - name (
str
, optional) – Name of the task. Default:None
. - description (
str
, optional) – Description of the task. Default:None
. - crontab_mask (
str
, optional) – Schedule in cron syntax. Default:"0 0 1/1 * ? *"
. (Daily at 12). - time_zone (
str
, optional) – Time zone. Default:"UTC"
. - status (
str
, optional) – One of the {‘RUNNING’, ‘PAUSED’} statuses. Default:RUNNING
. - start_time (
str
, optional) – Start time of task. Default:None
. - end_time (
str
, optional) – End time of task. Default:None
. - missed_trigger_handling (
str
, optional) – One of {‘IGNORE’, ‘RUN ALL’, ‘RUN ONCE’}. Default:IGNORE
.
Returns: An instance of
streamsets.sdk.sch_models.ScheduledTask
.- task_object (
-
class
streamsets.sdk.sch_models.
ScheduledTaskRun
(run)[source]¶ Scheduled Task Run.
Parameters: run ( dict
) – JSON representation if scheduled task run.
-
class
streamsets.sdk.sch_models.
ScheduledTasks
(control_hub)[source]¶ Collection of
streamsets.sdk.sch_models.ScheduledTask
instances.
Subscriptions¶
-
class
streamsets.sdk.sch_models.
Subscription
(subscription, control_hub)[source]¶ Subscription.
Parameters: - subscription (
dict
) – JSON representation of Subscription. - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
-
action
¶ Action of the Subscription.
-
events
¶ Events of the Subscription.
- subscription (
-
class
streamsets.sdk.sch_models.
Subscriptions
(control_hub)[source]¶ Collection of
streamsets.sdk.sch_models.Subscription
instances.Parameters: control_hub ( streamsets.sdk.sch.ControlHub
) – Control Hub object.
-
class
streamsets.sdk.sch_models.
SubscriptionAction
(action)[source]¶ Action to take when the Subscription is triggered.
Parameters: action ( dict
) – JSON representation of an Action for a Subscription.
-
class
streamsets.sdk.sch_models.
SubscriptionBuilder
(subscription, control_hub)[source]¶ Builder for Subscription.
- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_subscription_builder()
.
Parameters: - subscription (
dict
) – JSON representation of event subscription. - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
-
add_event
(event_type, filter=None)[source]¶ Add event to the Subscription.
Parameters: - event_type (
str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}. - filter (
str
, optional) – Filter to be applied on event. Default:None
.
- event_type (
-
build
(name, description=None)[source]¶ Builder for Scheduled Task.
Parameters: - name (
str
) – Name of Subscription. - description (
str
, optional) – Description of subscription. Default:None
.
Returns: An instance of
streamsets.sdk.sch_models.Subscription
.- name (
-
import_subscription
(subscription)[source]¶ Import an existing Subscription into the builder to update it.
Parameters: subscription ( streamsets.sdk.sch_models.Subscription
) – Subscription instance.
-
remove_event
(event_type)[source]¶ Remove event from the subscription.
Parameters: event_type ( str
) – Type of event in {‘Job Status Change’, ‘Data SLA Triggered’, ‘Pipeline Committed’, ‘Pipeline Status Change’, ‘Report Generated’, ‘Data Collector not Responding’}.Returns: An instance of streamsets.sdk.sch_models.SubscriptionEvent
.
-
set_email_action
(recipients, subject=None, body=None)[source]¶ Set the Email action.
Parameters: - recipients (
list
) – List of email addresses. - subject (
str
, optional) – Subject of the email. Default:None
. - body (
str
, optional) – Body of the email. Default:None
.
- recipients (
-
set_webhook_action
(uri, method='GET', content_type=None, payload=None, auth_type=None, username=None, password=None, timeout=30000, headers=None)[source]¶ Set the Webhook action.
Parameters: - uri (
str
) – URI for the Webhook. - method (
str
, optional) – HTTP method to use. Default:'GET'
. - content_type (
str
, optional) – Content Type of the request. Default:None
. - payload (
str
, optional) – Payload of the request. Default:None
. - auth_type (
str
, optional) –'basic'
orNone
. Default:None
. - username (
str
, optional) – username for the authentication. Default:None
. - password (
str
, optional) – password for the authentication. Default:None
. - timeout (
int
, optional) – timeout for the Webhook action. Default:30000
. - headers (
dict
, optional) – Headers to be sent to the Webhook call. Default:None
.
- uri (
-
class
streamsets.sdk.sch_models.
SubscriptionEvent
(event, control_hub)[source]¶ An Event of a Subscription.
Parameters: - event (
dict
) – JSON representation of Events of a Subscription. - control_hub (
streamsets.sdk.ControlHub
) – Control Hub instance.
- event (
Topologies¶
-
class
streamsets.sdk.sch_models.
Topology
(topology, control_hub=None)[source]¶ Model for Topology.
Parameters: topology ( dict
) – JSON representation of Topology.-
commit_id
¶ str
– Pipeline commit id.
-
commit_message
¶ str
– Commit Message.
-
commit_time
¶ int
– Time at which commit was made.
-
committed_by
¶ str
– User that made the commit.
-
default_topology
¶ bool
– Default Topology.
-
description
¶ str
– Topology description.
-
draft
¶ bool
– Indicates whether this topology is a draft.
-
last_modified_by
¶ str
– User that last modified this topology.
-
last_modified_on
¶ int
– Time at which this topology was last modified.
-
organization
¶ str
– Id of the organization.
-
parent_version
¶ str
– Version of the parent topology.
-
topology_definition
¶ str
– Definition of the topology.
-
topology_id
¶ str
– Id of the topology.
-
topology_name
¶ str
– Name of the topology.
-
version
¶ str
– Version of this topology.
-
-
class
streamsets.sdk.sch_models.
Topologies
(control_hub)[source]¶ Collection of
streamsets.sdk.sch_models.Topology
instances.Parameters: control_hub ( streamsets.sdk.sch.ControlHub
) – An instance of the Control Hub.
-
class
streamsets.sdk.sch_models.
TopologyBuilder
(topology)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.Topology
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_topology_builder()
.
Parameters: topology ( dict
) – Python object built from our Swagger TopologyJson definition.-
build
(topology_name, description=None)[source]¶ Build the topology.
Parameters: - topology_name (
str
) – Name of the topology. - description (
str
) – Description of the topology.
Returns: An instance of
streamsets.sdk.sch_models.Topology
.- topology_name (
Users¶
-
class
streamsets.sdk.sch_models.
User
(user, roles)[source]¶ Model for User.
Parameters: - user (
dict
) – JSON representation of User. - roles (
dict
) – A mapping of role IDs to role labels.
-
active
¶ bool
– Whether the user is active or not.
-
created_by
¶ str
– Creator of this user.
-
created_on
¶ str
– Creation time of this user.
-
display_name
¶ str
– Display name of this user.
-
email_address
¶ str
– Email address of this user.
-
id
¶ str
– Id of this user.
-
groups
¶ list
– Groups this user belongs to.
-
last_modified_by
¶ str
– User last modified by.
-
last_modified_on
¶ str
– User last modification time.
-
password_expires_on
¶ str
– User’s password expiration time.
-
password_system_generated
¶ bool
– Whether User’s password is system generated or not.
-
roles
¶ set
– A set of role labels.
-
saml_user_name
¶ str
– SAML username of user.
- user (
-
class
streamsets.sdk.sch_models.
Users
(control_hub, roles, organization)[source]¶ Collection of
streamsets.sdk.sch_models.User
instances.Parameters: - control_hub – An instance of
streamsets.sdk.sch.ControlHub
. - roles (
dict
) – A mapping of role IDs to role labels. - organization (
str
) – Organization ID.
- control_hub – An instance of
-
class
streamsets.sdk.sch_models.
UserBuilder
(user, roles, control_hub=None)[source]¶ Class with which to build instances of
streamsets.sdk.sch_models.User
.- Instead of instantiating this class directly, most users should use
streamsets.sdk.sch.ControlHub.get_user_builder()
.
Parameters: - user (
dict
) – Python object built from our Swagger UserJson definition. - roles (
dict
) – A mapping of role IDs to role labels. - control_hub – An instance of
streamsets.sdk.sch.ControlHub
. Default:None
.
-
build
(id, display_name, email_address, saml_user_name=None, ldap_user_name=None)[source]¶ Build the user.
Parameters: - id (
str
) – User Id. - display_name (
str
) – User display name. - email_address (
str
) – User Email Address. - saml_user_name (
str
, optional) – Default:None
. - ldap_user_name (
str
, optional) – Default:None
.
Returns: An instance of
streamsets.sdk.sch_models.User
.- id (
Common¶
Models used by StreamSets Data Collector and StreamSets Control Hub:
-
class
streamsets.sdk.models.
Configuration
(configuration=None, property_key='name', property_value='value', **kwargs)[source]¶ Abstraction for stage configurations.
This class enables easy access to and modification of data stored as a list of dictionaries. As an example, SDC’s pipeline configuration is stored in the form
[ { "name" : "executionMode", "value" : "STANDALONE" }, { "name" : "deliveryGuarantee", "value" : "AT_LEAST_ONCE" }, ... ]
By implementing simple
__getitem__
and__setitem__
methods, this class allows items in this list to be accessed usingconfiguration['executionMode'] = 'CLUSTER_BATCH'
Instead of the more verbose
for property in configuration: if property['name'] == 'executionMode': property['value'] = 'CLUSTER_BATCH' break
Parameters: - configuration (
str
) – List of dictionaries comprising the configuration. - property_key (
str
, optional) – The dictionary entry denoting the property key. Default:name
- property_value (
str
, optional) – The dictionary entry denoting the property value. Default:value
-
get
(key, default=None)[source]¶ Return the value of key or, if not in the configuration, the default value.
- configuration (
Exceptions¶
Common exceptions.
-
exception
streamsets.sdk.exceptions.
BadRequestError
(response)[source]¶ Bad request error (HTTP 400).
-
exception
streamsets.sdk.exceptions.
DeploymentInactiveError
(message)[source]¶ Deployment status changed into INACTIVE_ERROR.
-
exception
streamsets.sdk.exceptions.
InvalidCredentialsError
(message)[source]¶ Invalid credentials error.
-
exception
streamsets.sdk.exceptions.
JobInactiveError
(message)[source]¶ Job status changed into INACTIVE_ERROR.