Guide to Slowly Changing Dimensions [Intro + Type 1]

Posted on by By Sohail, in Business Intelligence, ETL, Helical Insight | 0

Guide to Slowly Changing Dimensions [Intro + Type 1]

Firstly what is a dimension?
A dimension is a structure that categorizes facts and measures which can be used to understand business requirements.

What is a Slowly Changing Dimension?
A Slowly Changing Dimension are dimensions that change slowly over time.

Why is it needed?
In Data Warehousing which deals with historical data, tracking of changes is important as it helps to give a better understanding of the Business.

An Example
Imagine having a customer dimension in a taxation department which holds records of address and so on,
and a customer changes his/her address it is important to track the address changes ie from old to new.

Types of Slowly Changing Dimension.
1. Type 1
2. Type 2
3. Type 3
4. Type 4
5. Type 5

1. Type 1:
This follows the technique of replacing the previous value, here no Historic data is/are kept, commonly used in correction or updating of records.

In this blog i would be showing you how to create a Type 1 Slowly Changing Dimension using Pentaho Data Integration:

Steps
1. Identify your source or create one if you have to.
CSV input

2. Load Data into the Database table
Load Data

Now Open a New transformation
3. Select a input component for your source in my case its a CSV input
4. Select a Get System Info input Component [Optional]
Double Click on it and
For Type: Select System Date (fixed)
For Name: load_data (Any Name)

5. If you are using a Get System info then this step is necessary
Select a Input Component Join Rows (Cartesian Product)
What this does is it makes a cross join of the new Load Date column to the Table or source table.
Select OK

6. Select a Select Values Input component,
this is used to add or remove or alter fields in the source before going to the target if required.

Note Make Sure all the components are connected as Shown below:
Before insert
Double Click on Select Values Component

Click on Get fields to select
Click on Meta-data and again select Get Fields to change
Click OK!

7. Select a insert/update Input component
Select your target Schema
Select your target Table

Now Select get fields
and remove the fields you don’t require while leaving the main key or primary key to look up values.
in my case client Id

Then

Select Get Update Fields
And in the update column i will want Select N for the column i don’t want to update.
in my case again “client_id”

Now if selected the Get System info component to have a load date column then the below step is important

Remember we added load_date we want add that column

so Select SQL
and

look through the pre-defined query available and you will see a Alter Table to add column load_date is present
so all we need to do is Select Execute and run a query to see if your column has been added as shown below

load_date added

Click OK!

8. Now Go to your Source and edit a data and add a data also as shown below

added_data
and then Save it

I have added a new row and also editied Sohail’s region from NY to NJ

Now your all transformation should something like this with all the conncetions pointing to the right component

after all

Now Run the Transformation

Now after running successfully

Check your data via query

final data

As you can see data has being replaced/overwrite for Sohail [NY to NJ]
and also added 1 more new row Data.

In My next blog we will discuss Type 2 of Slowly Changing Dimension

logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

Thanks
Izebhijie Sohail Ehizogie

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments