We connect your data warehouses on-premise and across public clouds.

StreamSets helps you build, orchestrate, deploy, and monitor data pipelines across multiple form factors giving you full DataOps control over your hybrid cloud analytics. You can sync data warehouses via bulk loading and real-time syncing.

A Platform Built for Hybrid Cloud

  • Connections for relational databases, EDWs, Hadoop, and common cloud services
  • Support for the major cloud platforms giving you flexibility
  • Kubernetes-enabled for simple deployment and elastic scale

Batch and Streaming

  • In-stream transformations
  • Construct any-to-any batch and streaming data pipelines in minutes
  • Use the same UI and pipeline repository for batch and streaming
  • Perform initial bulk loads and setup incremental updates

Heterogeneous Sources

  • Ingest and move data regardless of structure or semantics
  • Complex files structures (JSON blobs and semi-structured data) will ingest faster and will not be hindered by formatting checks
  • Performance using StreamSets is more consistent and can be executed with a smaller architecture footprint.

CDC

  • Dynamic tools for designing CDC (Change Data Capture) to ensure your source destinations are in sync with your cloud analytics environments
  • CDC sources include popular EDW destinations (Oracle, Teradata), relational databases (SQL, MySQL), and big data filesystems (HDFS, HBase, Kudu)
  • Users can specify their level of CDC capture vs. managing to pre-set windows

On-Prem

Teradata

Teradata provides leading database and data warehousing solutions that are foundational to almost every existing business. StreamSets helps augment and extend the capabilities of your Teradata solutions by managing connectivity and replication across your data environments.

Oracle

The Oracle EDW is a packaged data warehouse solution with enterprise-wide data collection capabilities. StreamSets helps users move data from multiple systems into Oracle EDW. StreamSets is available on the Oracle Cloud.

Greenplum Database

Greenplum is an open-source massively parallel data platform for analytics. StreamSets supports Greenplum as a source and destination. StreamSets helps bridge Greenplum data users to advanced analytics and streaming data products.

PostgreSQL

PostgreSQL is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards compliance. StreamSets “drift synchronization solution” for PostgreSQL detects drift in incoming data and automatically creates or alters corresponding PostgreSQL tables as needed before the data is written.

Postgres, is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards compliance. StreamSets “drift synchronization solution” for PostgreSQL detects drift in incoming data and automatically creates or alters corresponding PostgreSQL tables as needed before the data is written.

MySQL

MySQL is one of the most common open source relational database management systems. The MySQL Binary Log origin from StreamSets acts as a MySQL replication slave. MySQL replication allows you to maintain multiple copies of MySQL data by copying the data from a master to a slave server. The origin uses the replication process to capture changes from the MySQL master database and then pass the changed data to a Data Collector pipeline.

StreamSets on AWS amplifies the power of cloud data warehouses by simplifying and automating the process of getting both structured and unstructured data into the cloud
—so analytics experts, data engineers, SQL developers, enterprise architects, and other users can concentrate on how to best use that data.

SQL Server

SQL server is a foundational database technology powering millions of businesses today. SQL Server is commonly used for transactional data capture and StreamSets helps SQL users migrate data between data platforms and to analytics projects.

Cloud

Snowflake

StreamSets on AWS amplifies the power of cloud data warehouses by simplifying and automating the process of getting both structured and unstructured data into the cloud
—so analytics experts, data engineers, SQL developers, enterprise architects, and other users can concentrate on how to best use that data.

Azure SQL

Azure’s fast, flexible, and secure cloud data warehouse that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. Use SQL Data Warehouse and StreamSets to ingest data into relational tables with columnar storage. Once data is stored in SQL Data Warehouse, you can run analytics at massive scale.

StreamSets-Integration-AzureSQL

AWS Redshift

Redshift is a fast, scalable cloud-base data warehouse. Use StreamSets to automate and operationalize data pipelines into RedShift, mask data, encryption or removal of sensitive information such as PII before landing in RedShift.

AWS RDS

Amazon Relational Database Service (Amazon RDS) provides an easy to set up, operate, and scale relational database in the cloud. As adoption for PostgreSQL-based managed services evolves we have added a new origin to address these common cloud offerings.  StreamSets helps users setup CDC in minutes from their relational data sources into Amazon RDS.

Google BigQuery

BigQuery is Google's serverless, highly scalable, enterprise data warehouse designed to make data analysts productive using familiar SQL without the need for a database administrator. StreamSets helps companies quickly load and begin extracting value out of the BigQuery service.

Let your data flow

Receive Updates

Receive Updates

Join our mailing list to receive the latest news from StreamSets.

You have Successfully Subscribed!