StreamSets Data Integration Blog
Where change is welcome.
AWS Reference Architecture Guide for StreamSets
Using StreamSets DataOps Platform To Integrate Data from PostgreSQL to AWS S3 and Redshift: A Reference Architecture This document describes…
The Basics of Data Pipeline Architecture for Machine Learning
Machine learning has become an integral part of organizations looking to do everything from improve customer experience to make product recommendations to target advertisements. The machine learning (ML) pipeline defines the steps that help create the models used for ML predictions. Each step involved in an ML pipeline is distinct and can be broken into modules to increase reusability for…
8 Data Governance Principles To Live By
Data governance is essential for all businesses, but especially for enterprise companies with their petabytes of data. Properly governing your data can ensure it is accurate, consistent, and secure. This helps to protect your company from data breaches and other security threats. This blog post will discuss eight data governance principles that you should live by. Data Governance Principles Let’s…
Schema on Write vs. Schema on Read
In the simplest terms, schema is the structure of data inside a database. The structure of data can include things like field and table names, views, indexes, and snapshots. The definition of schema will often expand to include the relationships between data, for example, primary and foreign keys that logically connect separate tables. Analytics systems and legacy data management systems…
How To Formulate Your Data Governance Strategy in 5 Steps
Data governance refers to the policies and procedures governing how data is created, processed, and distributed. It’s used throughout the data lifecycle to ensure organizations have access to trustworthy data and comply with privacy and data safety laws. In this article, we’ll share actionable steps to help you and your organization build a data governance strategy that best serves your…
Data Federation vs. Data Virtualization
Data federation and data virtualization are so similar that the terms are often used interchangeably. And in practice, you’re unlikely to run into trouble if you conflate them. Even so, in the academic sense, virtualization and federation of data not the same. And because the terms are so often conflated, you’ll find many definitions of each term. So before we…