skip to Main Content

StreamSets Data Integration Blog

Where change is welcome.

The Basics of Data Pipeline Architecture for Machine Learning


Machine learning has become an integral part of organizations looking to do everything from improve customer experience to make product recommendations to target advertisements. The machine learning (ML) pipeline defines the steps that help create the models used for ML predictions. Each step involved in an ML pipeline is distinct and can be broken into modules to increase reusability for…

Brenna Buuck By January 11, 2023

8 Data Governance Principles To Live By


Data governance is essential for all businesses, but especially for enterprise companies with their petabytes of data. Properly governing your data can ensure it is accurate, consistent, and secure. This helps to protect your company from data breaches and other security threats. This blog post will discuss eight data governance principles that you should live by.  Data Governance Principles Let’s…

By January 4, 2023

Schema on Write vs. Schema on Read


In the simplest terms, schema is the structure of data inside a database. The structure of data can include things like field and table names, views, indexes, and snapshots. The definition of schema will often expand to include the relationships between data, for example, primary and foreign keys that logically connect separate tables.  Analytics systems and legacy data management systems…

Brenna Buuck By January 3, 2023

How To Formulate Your Data Governance Strategy in 5 Steps


Data governance refers to the policies and procedures governing how data is created, processed, and distributed. It’s used throughout the data lifecycle to ensure organizations have access to trustworthy data and comply with privacy and data safety laws.  In this article, we’ll share actionable steps to help you and your organization build a data governance strategy that best serves your…

By December 22, 2022

Data Federation vs. Data Virtualization


Data federation and data virtualization are so similar that the terms are often used interchangeably. And in practice, you’re unlikely to run into trouble if you conflate them.  Even so, in the academic sense, virtualization and federation of data not the same.  And because the terms are so often conflated, you’ll find many definitions of each term. So before we…

Jesse Summan By December 20, 2022
Back To Top