Guide to Slowly Changing Dimensions [Intro + Type 1]

Guide to Slowly Changing Dimensions [Intro + Type 1]

Firstly what is a dimension?
A dimension is a structure that categorizes facts and measures which can be used to understand business requirements.

What is a Slowly Changing Dimension?
A Slowly Changing Dimension are dimensions that change slowly over time.

Why is it needed?
In Data Warehousing which deals with historical data, tracking of changes is important as it helps to give a better understanding of the Business.

An Example
Imagine having a customer dimension in a taxation department which holds records of address and so on,
and a customer changes his/her address it is important to track the address changes ie from old to new.

Types of Slowly Changing Dimension.
1. Type 1
2. Type 2
3. Type 3
4. Type 4
5. Type 5

1. Type 1:
This follows the technique of replacing the previous value, here no Historic data is/are kept, commonly used in correction or updating of records.

In this blog i would be showing you how to create a Type 1 Slowly Changing Dimension using Pentaho Data Integration:

Steps
1. Identify your source or create one if you have to.
CSV input

2. Load Data into the Database table
Load Data

Now Open a New transformation
3. Select a input component for your source in my case its a CSV input
4. Select a Get System Info input Component [Optional]
Double Click on it and
For Type: Select System Date (fixed)
For Name: load_data (Any Name)

5. If you are using a Get System info then this step is necessary
Select a Input Component Join Rows (Cartesian Product)
What this does is it makes a cross join of the new Load Date column to the Table or source table.
Select OK

6. Select a Select Values Input component,
this is used to add or remove or alter fields in the source before going to the target if required.

Note Make Sure all the components are connected as Shown below:
Before insert
Double Click on Select Values Component

Click on Get fields to select
Click on Meta-data and again select Get Fields to change
Click OK!

7. Select a insert/update Input component
Select your target Schema
Select your target Table

Now Select get fields
and remove the fields you don’t require while leaving the main key or primary key to look up values.
in my case client Id

Then

Select Get Update Fields
And in the update column i will want Select N for the column i don’t want to update.
in my case again “client_id”

Now if selected the Get System info component to have a load date column then the below step is important

Remember we added load_date we want add that column

so Select SQL
and

look through the pre-defined query available and you will see a Alter Table to add column load_date is present
so all we need to do is Select Execute and run a query to see if your column has been added as shown below

load_date added

Click OK!

8. Now Go to your Source and edit a data and add a data also as shown below

added_data
and then Save it

I have added a new row and also editied Sohail’s region from NY to NJ

Now your all transformation should something like this with all the conncetions pointing to the right component

after all

Now Run the Transformation

Now after running successfully

Check your data via query

final data

As you can see data has being replaced/overwrite for Sohail [NY to NJ]
and also added 1 more new row Data.

In My next blog we will discuss Type 2 of Slowly Changing Dimension

Thanks
Izebhijie Sohail Ehizogie

Beginner’s Guide to E.T.L (Extract, Transform and Load) – Connection Set-Up

Connection Setup for connection type: PostgreSQL

[We are now setting up a connection to A database i.e if your source is a database]
There are 3 ways of access provided for Connections using PostgreSQL
a. Native (JBDC)
b. ODBC
c. JNDI

a. Native (JBDC)
I. Enter the Connection Name: Anyname
II. Select the connection type: PostgreSQL
III. Host Name: localhost [This can be an ip address]
Database Name: LearnETL[Name of the database you are using]
Port Number: 5432 or your required Port Number
User Name: Your database user name
Password: Your database password
IV. Test Connections and OK.

b. JNDI
Here we need to go to the data-integration folder and open up the sub-folder “simple-jndi” and edit jdbc.properties

Here we need to write the following code:

ETL_Connection/type=javax.sql.DataSource
ETL_Connection/driver=org.postgresql.Driver
ETL_Connection/url=jdbc:postgresql://localhost:5432/LearnETL
ETL_Connection/user=postgres
ETL_Connection/password=postgres

ETL_Connection: name of the connection
localhost:5432/LearnETL: localhost is the host name, 5432 is the port number and LearnETL is the Database name.
user: username
password: Password

Save and back to the Database connection
Restart your PDI.

and in the Setup, select JNDI and for
JNDI Name : name of your connection [ETL_Connection]

c. ODBC

This is not commonly used but what will be needed

are as follows:

1.Install the PostgreSQL ODBC driver which can be downloaded.
2.Select the PostgreSQL ODBC Unicode and then
3. Setup
enter Data source Source, Name, Server, Username, Password and Port. Test and Save if Connection is OK.!
4. a bundle of JAR files to be copied in your Java folder as this ODBC bundle files has been discontinued in JAVA 8 Bridge

Thank You
Sohail Izebhijie

Beginner’s Guide to E.T.L (Extract, Transform and Load) – A Basic Process

Loading Data from Source to Target

Before we proceed it’s important to identify the tool you might need to accomplish the process of ETL, in my case i would be using the Pentaho Data Integration Application (keep in mind irrespective of the tool you use the steps or procedures are similar but the approach might be differ).

The following Steps can be followed:

1. Identify your Source, it could be the following:
a. C.S.V file
b. Text file
c. A database
d. and so on

in my scenario a C.S.V (comma separated file)file.

2. Open Up your Spoon.bat
Select a new transformation and select a “Input” then select what type of input you require
we have Text File Input, C.S.V file input, SAS Input, Table Input and so on. In My case since i’ll be using C.S.V file as a Source i’ll select C.S.V file Input Component.

3. Set Up your Connection Based on your preferred connection type in my case i’ll be using the postgreSQL.

[Read my next Blog on setting Up a Connection using Connection type: PostgreSQL]

4. Once Connection has been Established you can Right-Click on the Connection and Select Share if that’s a common Connection to all your transformations will be using this will share the Transformation Connection to other transformation.

5. So we will be Sending data from Source to a Target to we need to to have a “Input” as the source
and an “Output” as the Target.

6. Input

download a C.S.V file from the internet

or even Create a TXT/C.S.V input file
as shown below.

Create a source if required
Text_Source (comma delimited)

Employee_Number,First_Name, Last_Name, Middle_Initial, Age, Gender, Title
101,Ehizogie,Izebhijie,Sohail,24,Male,Developer
102,Fahad,Anjum,WithClause,23,Male,Developer
103,Gayatri,Sharma,A,24,Female,Accountant

Save as txt or csv and this can be your input.

Here since our input is from a csv file
we open Up or C.S.V File Input component
Step Name: Anyname
File Name: Browse the selected path
Delimiter: , (comma)
Enclosure: ”

and then Select Get Fields and Select OK
Preview your Data

7. Output
Open Up Table output component
Select Target Schema
Select The Target Table

[Keeping in mind a Table exist in the DB]

Select OK!

Right-Click on the table output to MAP the columns from Source to Target

Now this is important as the to get the right data from the source to the column in the target
and Then Run

As a Beginner keep in mind that
Errors are a bound to occur
Such as Type of data from
Source Does not Match your Target Table format.
and so on.

Here a we can Have some little transformation step to convert and take care of such format errors
[In my next blog we can look into handling that]

Now Go to your Target Database and Run the SELECT * FROM table_name

there you go!

Thanks
Sohail Izebhijie

Beginner’s Guide to E.T.L (Extract, Transform and Load) – Introduction

Introduction into E.T.L (Extract, Transform and Load)
This a process related to data warehousing which involves the extracting of data out of the source system/Systems
and placing it into a repository or Target.

Extraction
Extracting the data from source systems (Flat Files or other operational systems) and converted into one consolidated data warehouse format which is ready for transformation.

Transformation
Transforming the data may involve the following tasks:

  • Cleaning: One of many very important task in the transforming stage because the Source data would always have data that the target system doesn’t support or understand hence cleaning is required.
    In some cases the Source can be from many source inputs so Lookup are important to avoid duplication.
  • Filtering: Now the Source Data would have so many rows but it’s important to send relevant data to your target and  filter out the unnecessary data.
  • Business Rules: Calculations or Derivations can be performed Here so we can have Correct and readable data at the target.

and many more.

Loading

After proper transformation and data matches the Business Rules loading the data into a target or repository is the final step in the E.T.L (Extract, Transform and Load)

in my next blog we will look into the basic Loading Data from Source to Target

Thanks
Sohail Izebhijie

The Print When Expression

The Print When Expression

The print when expression in Jasper Studio or Jasper iReport is very useful in so many requirements and it is defined as the name itself implies
i.e Print /Show a String, Column and so on Based on a condition passed.
Now here are some requirements on how/when we can implement The Print when expression.

Requiremnt: Show a particular column when a parameter is selected
Solution:
Here we made a parameter Called ShowLocation with values either ‘Y’ or ‘N’/blank space
now when ‘Y’ is selected then a Location column should show and when ‘N/blank space then it doesn’t show.

2 ways i did approach this:

1. I made a 2 Table Components one with the Location Column added and the other without the Location Column added.
one on top the other.
so when my parameter calls a ‘Y’ value it picks the Table with the Location Column and vice versa.

This approach is good but can be time and performance consuming

the 2nd approach was:

2. Only one Table Component was used, Opened the Table Component and selected the Column i want to restrict/show based on the parameter passed, in my scenario or requirement it is the Location column as
shown in the image below:

TableLayout

now i would select the cell button

Cellin the properties Tab and then Select the
Print When Expression.

Now here in the print when expression box i would write :
PrintExperssion
$P{showlocation}.equals(“Y”) as shown above

 

and you are good to go for a preview.
Now Let’s have to put a N or blank space in the parameter showlocation we get the following output
before
and now let’s pass ‘Y’ in the showlocation parameter

before-Y
So Now with this condition it will show the column when the parameter is checked in with a ‘Y’ Value.
and voila the output:

after

Thanks
Sohail Izebhijie.

Let’s Have Fun with Triggers in SQL

Let’s Have Fun with Triggers in SQL

What are Triggers?

Triggers are a Block of Structure that executes when a DML Statement is performed or Executed. Triggers are fun to use but from a business point of view not advisable but that a different story entirely ie the Pro’s and Con’s. but for gaining an idea about Triggers here is what the Video Covers

  1. Basic Creation of a Trigger
  2. and a simple real-time scenario on how i used triggers

The Video here illustrates or covers the above points.

Beginner’s Guide to OLTP and OLAP

Beginner’s Guide to OLTP and OLAP

In this blog we will be looking into the introduction of OLTP and OLAP. How it is defined and how we can relate it to real-time

Firstly, OLTP stands for On-line Transaction Processing and OLAP stands for On-line Analytical Processing

But to define OLTP and OLAP in the Simpliest of language is
OLTP is defined as “What is Happening” ie it deals with various transactions as it happens
and OLAP is defined as “What has happened” ie it deals with Stored Data.

OLTP is more of INSERT, DELETE, UPDATE, MODIFY i.e perform Day to Day Operations
But OLAP on the other hand is more of Fetching Data i.e Perform Analysis.

Take for Example a Supermarket Counter : the Counter receives payment for a particular product thus the event happening is a transaction which should be inputted into the Database
Hence it’s a OLTP.

But the OLAP is once the Data has been stored and the only Operation performed is Fetching the Data then it is said to be OLAP which can be used for
Business decision making and planning.

OLTP is a Relational Database and it’s 2-Dimensional in nature i.e Mathematically it is in the form of x and y axis
but OLAP on the other hand is a Multi-dimensional Database i.e Mathematically it is in the form of x, y, z, nth axis.

Now a question arises how is OLAP and OLTP represented? OLTP is represented in a Normalized Form and OLAP in a Star Schema or even at times in a Snow Flake Schema

OLAP Data- Structure is Designed in a Tree Format making it easy to Traverse.

Thanks,

Sohail

How to Replace Various Unwanted Characters during Reporting Using Jasper Studio

How to Replace Various Unwanted Characters during Reporting Using Jasper Studio

Reporting in Business Intelligence is the representation of Data in a more meaningful manner
so at times we need to follow some few formats to make our data to the End-User or Clients more meaningful to their Business

Now in some cases
characters like

commas (,), asterisk (*) and periods (.) are not permitted
and permitted characters can be Ampersand (&) , hyphens (-)

so what do we do here?

since jasper has various text functions one in particular that can be used here will be

the replaceAll() method

so this can be done on the particular columns you want make the required changes.

Now Let’s go to the field column you want and edit the Expression.

In my case:
I would be editing the First Name Column to the required format
so now we use the replaceAll() method

My requirement is to remove all commas and replace it with a [space]
so a simple

$F{firstname}.replaceAll(“,”,” “).trim()

can be used
Note: if you are wondering and new to Jasper why i used trim(), I used a trim() here because at times a name can be in the form

Helical,IT Solutions,

Now with the $F{firstname}.replaceAll(“,”,” “) it gives us the result Helical IT Solutions [space] this hereby create an extra space at the End so
a trim() method is used to remove unwanted spaces at the End.

OK now you have an idea on how to replace unwanted characters with the required characters
Now go ahead and replace * with –

and . with [space]

Did you get and Error?

Indeed yes you will get an Error if you
wrote your expression as

 $F{firstname}.replaceAll(“,”,” “).replaceAll(“*”,” “).replaceAll(“.”,” “).trim()

and if you didn’t well skip this part 😉

ok now if “.” is kept it actually treats it as a . and not as “.” the character same goes with the “*”

so the appropiate expression would be

$F{firstname}.replaceAll(“,”,” “).replaceAll(“[*]”,” “).replaceAll(“[.]”,” “).trim()

 

well there are various way also to express the . as character and [.] is one of them
and now you are good to go

– Izebhijie Sohail Ehizogie

Note: Quotations are in the form of quotation open and quotation close

Beginners Guide to Understanding Cross Join

Beginners Guide to Understanding Cross Join

In my previous Beginners Guide we talked about Types of joins i.e.

Inner Join, Left and Right Join but today we will be talking about cross join which happens to be more interesting to me than any other joins and happens to be helpful in real time for me.

What is a CROSS JOIN?

A CROSS JOIN is a JOIN operation that produces the Cartesian product of two tables.

Why CROSS JOIN?

Let’s say we have two table employee table “emp1” and department table “dept” now lets see how many records do we have in the emp1 table:

emp1
and also how many records do we have in the dept table:

dept

and for the emp1 table you want to “associate” the location column from the
dept table to all the records in the emp1 table so we use CROSS JOIN

but really why ?
In realtime Sometimes you might want to associate dates with a particular table and its records this happens to be helpful

Now we can do that with the Keyword CROSS JOIN as shown below.
cross

And there you go hope you will have a better understanding now.

-Sohail I.E

Business Intelligence Vs Business Analytics

BIvsBA

 

What’s the difference between Business Analytics and Business Intelligence? The correct answer is: everybody has an opinion, but nobody knows.

For example, when SAP says “business analytics” instead of “business intelligence”, it’s intended to indicate that business analytics is an umbrella term including data warehousing, business intelligence, enterprise information management, enterprise performance management, analytic applications, and governance, risk, and compliance.

But other vendors (such as SAS) use “business analytics” to indicate some level of vertical/horizontal domain knowledge tied with statistical or predictive analytics.

At the end of the day, there are two things worth differentiating:

  1. The first is the business aspect of BI — the need to get the most value out of information. This need hasn’t really changed in over fifty years (although the increasing complexity of the world economy means it’s ever harder to deliver). And the majority of real issues that stop us from getting value out of information (information culture, politics, lack of analytic competence, etc.) haven’t changed in decades either.
  2. The second is the IT aspect of BI — what technology is used to help provide the business need. This obviously does change over time —sometimes radically.

 

Quick Comparison

BIvsBA

We know that Analytics refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of data to gain insight and drive business planning. Analytics consists of two major areas: Business Intelligence and Business Analytics.

What is often overlooked is how the two differ based on the questions they answer:

  1. Business Intelligence : traditionally focuses on using a consistent set of metrics to measure past performance and guide business planning. Business Intelligence consists of querying, reporting, OLAP (online analytical processing), and can answer questions including “what happened,” “how many,” and “how often.”
  2. Business Analytics : goes beyond Business Intelligence by using sophisticated modelling techniques to predict future events or discover patterns which cannot be detected otherwise. Advanced Analytics can answer questions including “why is this happening,” “what if these trends continue,” “what will happen next” (prediction), “what is the best that can happen” (optimization).

 

The Evolution of Business Intelligence vs Business Analytics

In the past, BI has been used to talk about the people, processes and applications used to access and extrapolate meaning from data, for the sake of improving decisions and understanding the effectiveness of targeted decisions. But this is where BI as a baseline failed; something that runs entirely off of static, historic data severely limits a user’s ability to make predictive decisions and forecast for the future market. When an emergent situation arises on a Friday afternoon, the user doesn’t greatly benefit from looking at metrics collected prior to the introduction of that situation.

The rapid growth and demand for BA comes from this failing, and is in a way the evolved form of BI solutions. In a business world whose speed is ever-increasing, the user needs to be able to interact with information at the speed of business, not looking back over his or her shoulder at what happened in the past. BI setups alone do not support the occurrence of users asking and answering questions in the face of marketplace events as they happen. A company that is data-driven sees their data as a resource, and uses it to hedge out competition. The more current the data the user has, the better jump he or she has on the competitor, who may or may not have become a threat in a time so recent that traditional BI data reporting wouldn’t even take them into consideration.

Many companies are commonly implementing advanced analytics on top of their data warehouses, to bridge the gap between BI and current day needs. Perhaps this is the origin of the confusion between terms, as organizations pick and choose from different combinations of services and have no real understanding of what to call these mashups.

Equally relevant is the fact that more and more people are being asked to interpret data in roles that are not strictly analytical. Product managers, marketers and researchers are moving towards data as a way to formulate strategies, and traditional BI platforms make it difficult to push data into real-time situations and what-if scenarios.  With the importance of data-driven decisions increasingly becoming a realization for less tech-savvy branches of company teams, the need for more user-friendly and faster producing platforms also grows. Moreover, delivering the data that supports these decisions to a broader company team demands a more visual form of modeling tool, to improve understanding across all departments. Charts and graphs showing BA findings are quicker and more impacting than written out statistics and excel sheets full of data.

Data interpretation and the manipulation method of choice change as the market demands. While having a set of established methods is important to the effectiveness of a company’s strategy, it’s understanding the need for flexibility in the face of these changes that can be a company’s most valuable asset.

To summarize,

WHAT is happening to your business = Business Intelligence (For Visibility)
WHY it is happening, WHAT WILL likely happen in future = Business Analytics (For Investigation, Prediction & Prescription).

References:

  1. https://www.quora.com/What-is-the-difference-between-business-intelligence-and-business-analytics-1
  2. https://blog.rjmetrics.com
  3. https://rapidminer.com