Beginner’s Guide to E.T.L (Extract, Transform and Load) – Introduction

Introduction into E.T.L (Extract, Transform and Load)
This a process related to data warehousing which involves the extracting of data out of the source system/Systems
and placing it into a repository or Target.

Extraction
Extracting the data from source systems (Flat Files or other operational systems) and converted into one consolidated data warehouse format which is ready for transformation.

Transformation
Transforming the data may involve the following tasks:

  • Cleaning: One of many very important task in the transforming stage because the Source data would always have data that the target system doesn’t support or understand hence cleaning is required.
    In some cases the Source can be from many source inputs so Lookup are important to avoid duplication.
  • Filtering: Now the Source Data would have so many rows but it’s important to send relevant data to your target and  filter out the unnecessary data.
  • Business Rules: Calculations or Derivations can be performed Here so we can have Correct and readable data at the target.

and many more.

Loading

After proper transformation and data matches the Business Rules loading the data into a target or repository is the final step in the E.T.L (Extract, Transform and Load)

in my next blog we will look into the basic Loading Data from Source to Target

Thanks
Sohail Izebhijie

Passing parameters from parent job to sub job/transformation in Pentaho Data Integration (Kettle) -Part 1

Passing parameters from parent job to sub-job/transformation in Pentaho Data Integration (Kettle) – Part 1

Part -1 : Using setter and getter methods :

1.Set variables in parent job/transformation.
2.Get variables in sub-job/transformation.

1.Set variables :

Identify the field names that you are going to set using set variables step and assign each with a proper variable name.

And also define the scope of the variable with following possible options.

Variable Scope type :

1.Valid in the virtual machine: the complete virtual machine will know about this variable.
Warning: this makes your transformation only fit to run in a stand-alone fashion.
Running on an application server like on the Pentaho framework can become a problem.
That is because other transformations running on the server will also see the changes this step makes.

2.Valid in the parent job: the variable is only valid in the parent job.
3.Valid in the grand-parent job: the variable is valid in the grand-parent job and all the child jobs and transformations.
4.Valid in the root job: the variable is valid in the root job and all the child jobs and transformations.

Configure set variable step like below.

set_variables_pdi

But please note that, IT IS NOT POSSIBLE TO SET AND USE A VARIABLE IN THE SAME TRANSFORMATION.This is because all steps run in parallel.

2.Get variables :

Use get variables step in sub-job/transformation to get the same information from the parent job/transformation.

But you need to make sure that you have specified the variable name in a correct format like ${variable}
or %%variable%% (as described in Variables). That means you can also enter complete strings in the variable column, not just a variable.

Configure get variables step like below.

get_variables_pdi

So now you can utilize the same variables in your sub-job/transformation wherever required.

Apart from this,we can also pass all parameters down to sub-job/transformation using job / transformation executor steps.

Please follow my next blog for part 2 :
Passing parameters from parent job to sub job/transformation in Pentaho Data Integration (Kettle) -Part 2,

Thanks,

Sayagoud

Passing parameters from parent job to sub job/transformation in Pentaho Data Integration (Kettle) -Part 2

Passing parameters from parent job to sub job/transformation in Pentaho Data Integration (Kettle) -Part 2

In part 1, I have mentioned about passing parameter values down to sub-job/transformation using setter/getter methods.

Please refer my previous post for part 1 Passing parameters from parent job to sub job/transformation in Pentaho Data Integration (Kettle) -Part 1

But in this part we will use executor steps to do the same process.

Part 2 : Using job/transformation executor steps :

In order to pass the parameters from the main job to sub-job/transformation,we will use job/transformation executor steps depends upon the requirement.

Following are the steps :

1.Define variables in job properties section
2.Define variables in tranformation properties section
3.Configure job/transformation executor step

1.Define variables in job properties section :

Right-click any where on your job and select settings and go to parameters section.Define all the variables and assign some default values to each.

job_properties_pdi

2.Define variables in transformation properties section :

Right-click any where on your sub job/transformation and select settings and go to parameters section.
Use the same variables that you have defined in your parent job (i.e.Step1) and assign some default values to each.
transform_proeprties_pdi
3.Configure job/transformation executor step :

In this step,we have to configure sub job/transformation path details and need to pass the same parameters in job entry details section like below.

Double click on job/transformation executor step and provide transformation file path details.

transform_executor_config

Go to parameters section and make sure you have checked the Pass all parameter values down to the sub-transformation check box.

transform_executor_params

So now you can utilize same variables in your sub-transformation.

 

Thanks,

Sayagoud