StreamSets Data Integration Blog
Where change is welcome.
AWS Reference Architecture Guide for StreamSets
Using StreamSets DataOps Platform To Integrate Data from PostgreSQL to AWS S3 and Redshift: A Reference Architecture This document describes…
Data Warehouse Architecture: Explanation, Examples, Best Practices, and Alternatives
Data warehouses have become essential for organizations dealing with massive amounts of data, and while other alternatives come to market their adoption remains on the rise. This increased adoption accounts for a projected Compound Annual Growth Rate (CAGR) of 10.7% from 2020 to over $51B in 2028. The growth rate comes as no surprise, as there is increased demand for…
Difference Between Slowly Changing Dimensions and Change Data Capture
While some might observe that the difference between slowly changing dimensions (SCD) And Change Data Capture (CDC) might be subtle, there is in fact a technical difference between the two processes. Both processes detect changes in a source database and deliver the changed data to a target database. The difference between the two is almost entirely about what happens in…
Data Integration Architecture
An organization rarely has a single data source. Instead, it aggregates data from various sources like websites, applications, social media, and databases to make data easily accessible through data integration. In addition, this data needs to be transformed before being transported to their target locations. All these ingestion and transformation processes involve data of various sizes, structures, and types, thereby…
Getting to Success with Data Integration and ETL
Data is everywhere, in different formats and databases. Being able to integrate multiple, highly varied data sources is essential to running a business today. You have to be able to Extract-Transfer-Load (ETL) the data from each source into a database suitable for data analysis, like a data warehouse. In this article, PeerSpot’s real users of StreamSets discuss how the platform…
5 Best Practices for Building Data Pipelines
Whether you are building your very first pipeline or you’re an old pro, these best practices for building data pipelines can help you make pipelines that are easy to understand and therefore easy to maintain and extend. Design Data Pipelines for Simplicity Reduce complexity in the design wherever possible. This is a concept borrowed from software development. When reviewing your…