skip to Main Content

StreamSets Data Integration Blog

Where change is welcome.

Delta Lake Architecture: A Bridge Between Data Lakes & Data Warehouses


Data warehouses and data lakes are the most common central data repositories employed by most data-driven organizations today, each with its own strengths and tradeoffs. For one, while data warehouses allow businesses to organize historical datasets for use in business intelligence (BI) and analytics, they quickly become more cost-intensive as datasets grow because of the combined use of compute and…

Jesse Summan By February 7, 2023

Using the StreamSets Python SDK To Create Reusable StreamSets Pipelines (S3 to Redshift Example)

Use Cases

Many StreamSets Data Collector customers are now migrating their Hadoop ingestion pipelines to cloud platforms like AWS and they want to take full advantage of the AWS native services such as S3, EMR and Redshift. Landing in S3 is very straightforward. From there, customers often take data into EMR using Transformer for Spark for rich data processing and then directly…

Kavya Nagarajan By February 2, 2023

Extend StreamSets Integration With Source Systems Using Groovy

Use Cases

StreamSets Data Collector (SDC) supports 69 sources, including relational and no-SQL databases, on-prem and cloud file systems and a handful of messaging applications (documentation). Yet, occasionally, customers ask if Data Collector can integrate with an app or system not explicitly called out in documentation. If there is a Java library providing an API, the answer is YES; StreamSets can use…

Roman Bukarev StreamSets By February 1, 2023

3 Ways To Keep Up With Constant Change


The business climate today feels a bit like a battleground, and everyone’s feeling the pressure. A recession looms, competition is fierce, ongoing supply chain issues wreak havoc, and customer expectations are higher than they’ve ever been. Dodge left, dodge right… your business users are under pressure to keep pace, no matter where they turn. Organizations are making a number of…

By January 26, 2023

4 Ways Data Federation Tools Will Let You Down


Data federation tools are often touted for their ability to unify and query data in a variety of sources and formats using virtualization.  The technology, the theory goes, provides a single, unified view of data without requiring you to manage a variety of data sources. This allows analysts to avoid waiting on backlogged development resources to access data. But the…

Brenna Buuck By January 25, 2023
Back To Top