skip to Main Content

StreamSets Data Integration Blog

Where change is welcome.

Schema on Write vs. Schema on Read


In the simplest terms, schema is the structure of data inside a database. The structure of data can include things like field and table names, views, indexes, and snapshots. The definition of schema will often expand to include the relationships between data, for example, primary and foreign keys that logically connect separate tables.  Analytics systems and legacy data management systems…

Brenna Buuck By January 3, 2023

How To Formulate Your Data Governance Strategy in 5 Steps


Data governance refers to the policies and procedures governing how data is created, processed, and distributed. It’s used throughout the data lifecycle to ensure organizations have access to trustworthy data and comply with privacy and data safety laws.  In this article, we’ll share actionable steps to help you and your organization build a data governance strategy that best serves your…

By December 22, 2022

Data Federation vs. Data Virtualization


Data federation and data virtualization are so similar that the terms are often used interchangeably. And in practice, you’re unlikely to run into trouble if you conflate them.  Even so, in the academic sense, virtualization and federation of data not the same.  And because the terms are so often conflated, you’ll find many definitions of each term. So before we…

Jesse Summan By December 20, 2022

The Nuts and Bolts of the Databricks Lakehouse Platform


Exploding data growth has led to a search for a robust, scalable, high performance data solution that can accommodate growing data demand. There are many solutions available, but the data warehouse and data lake are two of the most popular.  While a data warehouse collects and stores processed data for business intelligence and data analytics, the data lake offers a…

Brenna Buuck By December 19, 2022

Four Machine Learning Deployment Methods + How To Choose the Best One


The primary goal of machine learning (ML) is to perform a task more efficiently using models, which only becomes possible if the ML models are available for end users. Most view ML deployment as an art, requiring careful collaboration between the data science, software engineering, and DevOps teams to deploy a model successfully. Also, because teams focus on different aspects…

Brenna Buuck By December 15, 2022
Back To Top