Schema on Write vs. Schema on Read

By Brenna Buuck and Leslie Posted in Data Integration January 3, 2023

In the simplest terms, schema is the structure of data inside a database. The structure of data can include things like field and table names, views, indexes, and snapshots. The definition of schema will often expand to include the relationships between data, for example, primary and foreign keys that logically connect separate tables.

Analytics systems and legacy data management systems require a schema, which can be generated either on write or on read. When schema is generated on write, the schema comes before the data. A very common schema on write scenario is that a data engineer creates several tables in a relational database that are connected by primary keys with a rigid schema. Then, the data engineer populates the table with data. In a schema on read scenario, different types of data, potentially both structured and unstructured, are loaded into the destination, and the schema is generated when queries against the data are executed. This means the data engineer can spend more time crafting queries to gain better insights rather than spending all of their time carefully defining fields.

Schema Past and Future

Schema on write was the default method for decades. Data engineers would spend a significant portion of their time defining schema and relationships before ever starting to analyze their data. Today, more modern data tools tend towards schema on read. The trend is towards automation of time-consuming and manual processes that don’t need human intervention. Defining schema falls squarely into this bucket.

At a Glance: Schema on Read vs. Schema on Write

We’ve already gone over the main differentiator between schema on read and schema on write, but there are other more subtle differences. Let’s explore them.

	Schema on Write	Schema on Read
Schema	User has to define a schema	Schema is inferred from the data
Data	Structured and relational	Unstructured and Structured
End User Eperience	The only queryable data is pre-selected	Allows richer data exploration
Positive Features	Lightweight	Adaptable

When you have to define your data before it arrives at your destination, as you do with schema on write, it most often has to be structured and relational. Schema on read, on the other hand, can handle all kinds of data, including unstructured and structured.

Regarding the end user experience, schema on write forces data architects and engineers to be explicit about what data goes to their warehouse before they can analyze it. As you can imagine, this can pose a problem. Schema on read allows for more flexibility and a richer data exploration experience because analysts can pull in fields as needed.

Finally, while there is no right or wrong way to apply schema, there are positive features to both schema on read and schema on write. Schema on read benefits from excellent adaptability inherent in its design, while schema on write offers a lightweight solution that can offer lightning-fast query performance.

StreamSets and Schema

StreamSets aligns with the more modern way of handling schema, taking a schema on read approach. This design choice means pipelines don’t need to be re-written when new fields are created in the origin. Instead, the schema is inferred and passed to the transforms and destination without the need for human intervention. This makes for robust pipelines that can adapt to change. In other words, StreamSets pipelines respond automatically to data drift, a critical function for a modern data strategy.

Related Resources

Webinar

Integration Roadmap: Navigating the Future of iPaaS with webMethods and StreamSets

Get introduced to the newest capabilities of webMethods.io and StreamSets. Plus get a sneak peek into Software AG’s vision for the iPaaS...

Watch Now

Whitepapers & Ebooks

The Data Integration Advantage: Building a Foundation for Scalable AI

Explore the state of AI in the enterprise including challenges of scaling and optimizing data flows.

Download Now

Report

Creating Order from Chaos: Governance in the Data Wild West

In the simplest terms, schema is the structure of data inside a database. The structure of data can include things like field and table...

Download Now

Schema on Write vs. Schema on Read

Schema Past and Future

At a Glance: Schema on Read vs. Schema on Write

StreamSets and Schema

Topics

Authors

Quick Links

Conduct Data Ingestion and Transformations In One Place