skip to Main Content

The DataOps Blog

Where Change Is Welcome

Schemas Used in Data Warehouses: Star, Galaxy, and Snowflake

Brenna Buuck
By Posted in Engineering May 31, 2022

Much like how a relational database has a database schema that defines the relationships and structure rules within, there are schemas in data warehouses. Also known as multi-dimension schemas, these schemas define rules for how these data warehouses manage the names, descriptions, associated data items, and aggregates within a data warehouse. 

What is Schema in a Data Warehouse?

We can think of a data warehouse schema as a blueprint or an architecture of how data will be stored and managed. A data warehouse schema isn’t the data itself, but the organization of how data is stored and how it relates to other data within the data warehouse.

In the past, data warehouse schemas were often strictly enforced across an enterprise, but in modern implementations where storage is increasingly inexpensive, schemas have become less constrained. Despite this loosening or sometimes total abandonment of data warehouse schemas, knowledge of the foundational schema designs can be important to both maintaining legacy resources and for creating modern data warehouse design that learns from the past. 

The basic components of all data warehouse schemas are fact and dimension tables. The different combination of these two central elements compose almost the entirety of all data warehouse schema designs.

Fact Table

A fact table aggregates metrics, measurements, or facts about business processes. In this example, fact tables are connected to dimension tables to form a schema architecture representing how data relates within the data warehouse. Fact tables store primary keys of dimension tables as foreign keys within the fact table. 

fact table

Dimension Table

Dimension tables are non-denormalized tables used to store data attributes or dimensions. As mentioned above, the primary key of a dimension table is stored as a foreign key in the fact table. Dimension tables are not joined together. Instead, they are joined via association through the central fact table.

Dimension Table

3 Types of Schema Used in Data Warehouses

History presents us with three prominent types of data warehouse schema known as Star Schema, Snowflake Schema, and Galaxy Schema. Each of these data warehouse schemas has unique design constraints. 

What is a Star Schema in a Data Warehouse?

The star schema in a data warehouse is historically one of the most straightforward designs. This schema follows some distinct design parameters, such as only permitting one central table and a handful of single-dimension tables joined to the table. In following these design constraints, star schema can resemble a star with one central table, and five dimension tables joined (thus where the star schema got its name). 

Star Schema is known to create denormalized dimension tables – a database structuring strategy that organizes tables to introduce redundancy for improved performance. Denormalization intends to introduce redundancy in additional dimensions so long as it improves query performance.

 

Star Schema Visual Example

Characteristics of the Star Schema: 

  • Star data warehouse schemas create a denormalized database that enables quick querying responses
  • The primary key in the dimension table is joined to the fact table by the foreign key
  • Each dimension in the star schema maps to one dimension table
  • Dimension tables within a star scheme are not to be connected directly
  • Star schema creates denormalized dimension tables

What is a Snowflake Schema?

The Snowflake Schema is a data warehouse schema that encompasses a logical arrangement of dimension tables. This data warehouse schema builds on the star schema by adding additional sub-dimension tables that relate to first-order dimension tables joined to the fact table. 

Just like the relationship between the foreign key in the fact table and the primary key in the dimension table, with the snowflake schema approach, a primary key in a sub-dimension table will relate to a foreign key within the higher order dimension table. 

Snowflake schema creates normalized dimension tables – a database structuring strategy that organizes tables to reduce redundancy. The purpose of normalization is to eliminate any redundant data to reduce overhead. 

Snowflake Schema Visual Example

Characteristics of the Snowflake Schema:  

  • Snowflake Schema are permitted to have dimension tables joined to other dimension tables
  • Snowflake Schema are to have one fact table only
  • Snowflake Schema create normalized dimension tables
  • The normalized schema reduces required disk space for running and managing this data warehouse
  • Snowflake Scheme offer an easier way to implement a dimension

What is a Galaxy Schema?

The Galaxy Data Warehouse Schema, also known as a Fact Constellation Schema, acts as the next iteration of the data warehouse schema. Unlike the Star Schema and Snowflake Schema, the Galaxy Schema uses multiple fact tables connected with shared normalized dimension tables. Galaxy Schema can be thought of as star schema interlinked and completely normalized, avoiding any kind of redundancy or inconsistency of data.

galaxy schema visual example

Characteristics of the Galaxy Schema:  

  • Galaxy Schema is multidimensional acting as a strong design consideration for complex database systems
  • Galaxy Schema reduces redundancy to near zero redundancy as a result of normalization
  • Galaxy Schema is known for high data quality and accuracy and lends to effective reporting and analytics

 

Key Differences Between Star, Snowflake, and Galaxy schema

 

Star Schema Snowflake Schema Galaxy Schema
Elements Single Fact Table connected to multiple dimension tables with no sub-dimension tables Single Fact Table connects to multiple dimension tables that connects to multiple sub-dimension tables Multiple Fact Tables connects to multiple dimension tables that connects to multiple sub-dimension tables
Normalization Denormalized  Normalized Normalized
Number of Dimensions Multiple dimension tables map to a single Fact Table Multiple dimension tables map to multiple dimension tables Multiple dimension tables map to multiple Fact Tables
Data Redundancy High Low Low
Performance Fewer foreign keys resulting in increased performance Decreased performance compared to Star Schema from higher number of foreign keys Decreased performance compared to Star and Snowflake. Used for complex data aggregation.
Complexity Simple, designed to be easy to understand More complicated compared to Star Schema – can be more challenging to understand  Most complicated to understand. Reserved for highly complex data structures
Storage Usage Higher disk space due to data redundancy Lower disk space due to limited data redundancy Low disk space usage compared to the level of sophistication due to the limited data redundancy 
Design Limitations One Fact Table only, no sub-dimensions One Fact Table only, multiple sub-dimensions are permitted Multiple Fact Tables permitted, only first level dimensions are permitted

 

How StreamSets’ Schema-agnostic Approach Makes Schemas  Easy

Our agnostic approach to schema management means that StreamSets smart data pipelines can manage any kind of schema – simple, complex or non-existent. Meaning, with StreamSets you don’t have to spend hours matching the schema from a legacy origin into your destination, instead StreamSets can infer any kind of schema without you having to lift a finger. If however, you want to enforce a schema and create hard and fast validation rules, StreamSets can help you with that as well. Our flexibility in how we manage schemas means your data teams have less to figure out on their own and more time to spend on what really matters: your data.

Ready to combat data drift automatically? Get started with the StreamSets DataOps Platform.

Try StreamSets Free

Back To Top

We use cookies to improve your experience with our website. Click Allow All to consent and continue to our site. Privacy Policy

[class^="wpforms-"]
[class^="wpforms-"]