Stage Event Generation

You can configure certain stages to generate events. Event generation differs from stage to stage, based on the way the stage processes data.

For details about each the event generation for each stage, see "Event Generation" in the stage documentation.

The following table lists event-generating stages and when they can generate events:
Stage Generates events when the stage...
Amazon S3 origin
  • Completes processing all available objects and the configured batch wait time has elapsed.

For more information, see Event Generation for the Amazon S3 origin.

Azure Data Lake Storage Gen1 origin
  • Starts processing a file.
  • Completes processing a file.
  • Completes processing all available files and the configured batch wait time has elapsed.

For more information, see Event Generation for the Azure Data Lake Storage Gen1 origin.

Azure Data Lake Storage Gen2 origin
  • Starts processing a file.
  • Completes processing a file.
  • Completes processing all available files and the configured batch wait time has elapsed.

For more information, see Event Generation for the Azure Data Lake Storage Gen2 origin.

Google BigQuery origin
  • Successfully completes a query.

For more information, see Event Generation for the Google Big Query origin.

Google Cloud Storage origin
  • Completes processing all available objects and the configured batch wait time has elapsed.

For more information, see Event Generation for the Google Cloud Storage origin.

MySQL Multitable Consumer origin
  • Completes processing the data returned by the queries for all tables.
  • Completes processing all data within a schema.
  • Completes processing all data within a table.

For more information, see Event Generation for the MySQL Multitable Consumer origin.

MySQL Query Consumer origin
  • Completes processing all data returned by a query.
  • Successfully completes a query.
  • Fails to complete a query.

For more information, see Event Generation for the MySQL Query Consumer origin.

Oracle CDC Client origin
  • Reads DDL statements in the redo log.

For more information, see Event Generation for the Oracle CDC Client origin.

Oracle Multitable Consumer origin
  • Completes processing the data returned by the queries for all tables.
  • Completes processing all data within a schema.
  • Completes processing all data within a table.

For more information, see Event Generation for the Oracle Multitable Consumer origin.

Oracle Query Consumer origin
  • Completes processing all data returned by a query.
  • Successfully completes a query.
  • Fails to complete a query.

For more information, see Event Generation for the Oracle Query Consumer origin.

PostgreSQL Multitable Consumer origin
  • Completes processing the data returned by the queries for all tables.
  • Completes processing all data within a schema.
  • Completes processing all data within a table.

For more information, see Event Generation for the PostgreSQL Multitable Consumer origin.

PostgreSQL Query Consumer origin
  • Completes processing all data returned by a query.
  • Successfully completes a query.
  • Fails to complete a query.

For more information, see Event Generation for the PostgreSQL Query Consumer origin.

Salesforce origin
  • Completes processing all data returned by a query.

For more information, see Event Generation for the Salesforce origin.

SQL Server Multitable Consumer origin
  • Completes processing the data returned by the queries for all tables.
  • Completes processing all data within a schema.
  • Completes processing all data within a table.

For more information, see Event Generation for the SQL Server Multitable Consumer origin.

SQL Server Query Consumer origin
  • Completes processing all data returned by a query.
  • Successfully completes a query.
  • Fails to complete a query.

For more information, see Event Generation for the SQL Server Query Consumer origin.

Amazon S3 destination
  • Completes writing to an object.
  • Completes streaming a whole file.

For more information, see Event Generation for the Amazon S3 destination.

Azure Data Lake Storage Gen1 destination
  • Closes a file.
  • Completes streaming a whole file.

For more information, see Event Generation for the Azure Data Lake Storage Gen1 destination.

Azure Data Lake Storage Gen2 destination
  • Closes a file.
  • Completes streaming a whole file.

For more information, see Event Generation for the Azure Data Lake Storage Gen2 destination.

Google Cloud Storage destination
  • Completes writing to an object.
  • Completes streaming a whole file.

For more information, see Event Generation for the Google Cloud Storage destination.

ADLS Gen1 File Metadata executor
  • Changes file metadata, such as the file name, location, or permissions.
  • Creates an empty file.
  • Removes a file or directory.

For more information, see Event Generation for the ADLS Gen1 File Metadata executor.

ADLS Gen2 File Metadata executor
  • Changes file metadata, such as the file name, location, or permissions.
  • Creates an empty file.
  • Removes a file or directory.

For more information, see Event Generation for the ADLS Gen2 File Metadata executor.

Amazon S3 executor
  • Creates a new Amazon S3 object.
  • Copies an object another location.
  • Adds tags to an existing object.

For more information, see Event Generation for the Amazon S3 executor.

Databricks Delta Lake executor
  • Determines that the submitted query completed successfully.
  • Determines that the submitted query failed to complete.

For more information, see Event Generation for the Databricks Delta Lake executor.

Using Stage Events

You can use stage-related events in any way that suits your needs. When configuring the event stream for stage events, you can add additional stages to the stream. For example, you might use a Stream Selector to route different types of events to different executors. But you cannot merge the event stream with a data stream.

There are two general types of event streams that you might create:
  • Task execution streams that route events to an executor to perform a task.
  • Event storage streams that route events to a destination to store event information.
You can, of course, configure an event stream that performs both tasks by routing event records to both an executor and a destination. You can also configure event streams to route data to multiple executors and destinations, as needed.

Task Execution Streams

A task execution stream routes event records from the event-generating stage to an executor stage. The executor performs a task each time it receives an event record.

For example, you have a pipeline that reads from a MySQL database and writes files to Azure Data Lake Storage Gen2:

When the Azure Data Lake Storage Gen2 destination closes an output file, you would like the file moved to a different directory and the file permissions changed to read-only.

Leaving the rest of the pipeline as is, you can enable event handling in the Azure Data Lake Storage Gen2 destination, connect it to the ADLS Gen2 File Metadata executor, and configure the ADLS Gen2 File Metadata executor to move files and change permissions. The resulting pipeline looks like this:

If you needed to set permissions differently based on the file name or location, you could use a Stream Selector processor to route the event records accordingly, then use two ADLS Gen2 File Metadata executors to alter file permissions, as follows:

Event Storage Streams

An event storage stream routes event records from the event-generating stage to a destination. The destination writes the event record to a destination system.

Event records include information about the event in record header attributes and record fields. You can add processors to the event stream to enrich the event record before writing it to the destination.

For example, you have a pipeline that uses the MySQL Query Consumer origin to read data:

The MySQL Query Consumer origin generates event records each time it successfully completes a query or fails to complete a query. For auditing purposes, you'd like to write this information to a database table.

Leaving the rest of the pipeline as is, you can enable event handling for the MySQL Query Consumer origin and simply connect it to the MySQL Producer destination as follows:

But you want to know when events occur. The MySQL Query Consumer event record stores the event creation time in the sdc.event.creation_timestamp record header attribute. So you can use an Expression Evaluator processor with the following expression to add the creation date and time to the record:
${record:attribute('sdc.event.creation_timestamp')}
And if you have multiple pipelines writing events to the same location, you can use the following expression to include the pipeline name in the event record as well:
${pipeline:name()}

The Expression Evaluator processor and the final pipeline look like this: