The Simplest Way to Understand SCD in Data Warehouse

Posted by

What is SCD?

SCD stands for Slowly Changing Dimensions.

It is very important in Data Warehouse.

As we know, ETL (Extract, Transform, Load) is between data sources and data warehouse.

When ETL runs, it will pick up all records and update them in the Dimension tables.

Why we need SCD?

Because we have some problems in updating data in Data Warehouse when data in data sources are changing.

In the dimension tables, if we want to keep some old records, how we can do this?

Using SCD.

Types of SCD:

Note: The types of SCD are defined on Column level, not on the table level.

There are two popular types in SCD.

Type 1: Overwrite

An old record is updated by the new record. It means covering the old records.

Type 2: Store history by crrating another row

Type 2 is to add new records rather than covering the old records.

As long as we have a type 2 in tables, we must have two extra values: ‘StartDate’ and ‘EndDate’. The ‘EndDate’ is when the changes happen, as the end date of historical data.

We can have a third one, ‘IsCurrent’, to identify or mark the current record.

We don’t need to update the records, and we just need to update ‘StartDate’ and ‘EndDate’.

If you are interested in some high income skills or have any problems with fact tables and dimension tables, feel free to contact me .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s