Getting Started Videos for Building Data Pipelines
Watch how to build a data pipeline to ingest streaming tweets using HTTP Client origin, perform transformations, and send the transformed twitter data to Kafka.
StreamSets data pipelines are “smart” with built-in data drift detection. More than just schema evolution, you stay in control of when to update and when to disregard changes.
Design change data capture (CDC) jobs to automatically update table structures and columns when new information is added mid stream from Oracle to Snowflake.
Migrate a database to multiple cloud data warehouses with a single data pipeline. Learn how to move an Oracle database to Snowflake and Delta Lake on Databricks at the same time.
What is a Data Pipeline?
A data pipeline is the series of steps that allows data from one system to move to and become useful in another system. A smart data pipeline abstracts away technical implementation details and automates as much as possible, so it can be easily set up and operate continuously with very little manual intervention.
A batch data pipeline moves data periodically and is often used for bulk replication and ETL processing.
A streaming data pipeline flows data continuously from origin to destination as it is created. Think of web clicks on a shopping site being used for real-time product recommendations or in a banking app for fraud detection.
A change data capture (CDC) pipeline is used to keep multiple systems in sync. For example, your on-prem inventory data may be needed in a cloud-based web app to generate online catalog results in real time.