Guide to Slowly Changing Dimensions [Intro + Type 1]
Firstly what is a dimension?
A dimension is a structure that categorizes facts and measures which can be used to understand business requirements.
What is a Slowly Changing Dimension?
A Slowly Changing Dimension are dimensions that change slowly over time.
Why is it needed?
In Data Warehousing which deals with historical data, tracking of changes is important as it helps to give a better understanding of the Business.
Imagine having a customer dimension in a taxation department which holds records of address and so on,
and a customer changes his/her address it is important to track the address changes ie from old to new.
Types of Slowly Changing Dimension.
1. Type 1
2. Type 2
3. Type 3
4. Type 4
5. Type 5
1. Type 1:
This follows the technique of replacing the previous value, here no Historic data is/are kept, commonly used in correction or updating of records.
In this blog i would be showing you how to create a Type 1 Slowly Changing Dimension using Pentaho Data Integration:
Now Open a New transformation
3. Select a input component for your source in my case its a CSV input
4. Select a Get System Info input Component [Optional]
Double Click on it and
For Type: Select System Date (fixed)
For Name: load_data (Any Name)
5. If you are using a Get System info then this step is necessary
Select a Input Component Join Rows (Cartesian Product)
What this does is it makes a cross join of the new Load Date column to the Table or source table.
6. Select a Select Values Input component,
this is used to add or remove or alter fields in the source before going to the target if required.
Click on Get fields to select
Click on Meta-data and again select Get Fields to change
7. Select a insert/update Input component
Select your target Schema
Select your target Table
Now Select get fields
and remove the fields you don’t require while leaving the main key or primary key to look up values.
in my case client Id
Select Get Update Fields
And in the update column i will want Select N for the column i don’t want to update.
in my case again “client_id”
Now if selected the Get System info component to have a load date column then the below step is important
Remember we added load_date we want add that column
so Select SQL
look through the pre-defined query available and you will see a Alter Table to add column load_date is present
so all we need to do is Select Execute and run a query to see if your column has been added as shown below
8. Now Go to your Source and edit a data and add a data also as shown below
I have added a new row and also editied Sohail’s region from NY to NJ
Now your all transformation should something like this with all the conncetions pointing to the right component
Now Run the Transformation
Now after running successfully
Check your data via query
As you can see data has being replaced/overwrite for Sohail [NY to NJ]
and also added 1 more new row Data.
In My next blog we will discuss Type 2 of Slowly Changing Dimension
Izebhijie Sohail Ehizogie