skip to Main Content

Building Data Pipelines with StreamSets Data Collector

Build batch, streaming, and change data capture pipelines in minutes

Getting Started Videos for Building Data Pipelines

Watch how to build a data pipeline to ingest streaming tweets using HTTP Client origin, perform transformations, and send the transformed twitter data to Kafka.

StreamSets data pipelines are “smart” with built-in data drift detection. More than just schema evolution, you stay in control of when to update and when to disregard changes.

Design change data capture (CDC) jobs to automatically update table structures and columns when new information is added mid stream from Oracle to Snowflake.

Migrate a database to multiple cloud data warehouses with a single data pipeline. Learn how to move an Oracle database to Snowflake and Delta Lake on Databricks at the same time.

Looking for more demos? Subscribe to Demos with Dash! In these monthly 45 min sessions, you will get to see live demos of StreamSets DataOps Platform.

What is a Data Pipeline?

A data pipeline is the series of steps that allows data from one system to move to and become useful in another system. A smart data pipeline abstracts away technical implementation details and automates as much as possible, so it can be easily set up and operate continuously with very little manual intervention. 

A batch data pipeline moves data periodically and is often used for bulk replication and ETL processing. 

A streaming data pipeline flows data continuously from origin to destination as it is created. Think of web clicks on a shopping site being used for real-time product recommendations or in a banking app for fraud detection. 

A change data capture (CDC) pipeline is used to keep multiple systems in sync. For example, your on-prem inventory data may be needed in a cloud-based web app to generate online catalog results in real time.

Back To Top