Advanced Usage

This page highlights some of the more advanced functionality that users might come across when developing tests with the StreamSets Test Framework (STF).


STF includes implementations of a number of fixtures to enable their immediate use in tests.


The StreamSets Test Framework extensively uses pytest fixtures to simplify access to objects needed within a test. As an example, most environments represented by classes under streamsets.testframework.environment have corresponding fixtures, which allows users to simply refer to them in the test function’s argument list and then use them within tests. This eliminates the need to import the classes individually and also allows for their efficient reuse across tests.

StreamSets Data Collector instances

Fixtures are also used to represent instances of streamsets.testframework.sdc.DataCollector in the form of streamsets.testframework.conftest.sdc_builder and streamsets.testframework.conftest.sdc_executor. Along with the benefits described above for environment classes, the use of these two fixtures also allows regular functional tests to be run as upgrade tests. This behavior is controlled by the invocation of stf test when the --sdc-version argument is passed. If the string following this argument is a single SDC version, the same Docker-based StreamSets Data Collector instance is used to build and execute the pipeline; however, if two versions are passed (e.g. --sdc-version ' > 3.4.0'), two SDC instances will be used with the first version given used to build pipelines and the second used to execute the pipelines exported from the first.

streamsets.testframework.conftest.sdc_common_hook, streamsets.testframework.conftest.sdc_builder_hook, and streamsets.testframework.conftest.sdc_executor_hook are three more fixtures whose reference implementation is a no-op. To use, reimplement these fixtures in your test module, which will have the affect of acting on the corresponding Data Collector instance fixture ahead of the streamsets.testframework.sdc.DataCollector.start() being invoked. These are particularly useful for doing things like manipulating, which needs to be done before Data Collector is started if the changes are to take effect. Also note that the common hook is executed before the builder/executor hook, allowing for a composition of actions.


Users familiar with the StreamSets SDK for Python will note that one of the most visible differences between the SDK and STF is the addition of a number of configure_for_environment methods. For details about the functionality they implement, please take a look at streamsets.testframework.sdc.DataCollector.configure_for_environment(), streamsets.testframework.sdc_models.Pipeline.configure_for_environment(), and streamsets.testframework.sdc_models.Stage.configure_for_environment().