Loops in Pentaho Data Integration

Posted on by By Sohail, in Business Intelligence, Open Source Business Intelligence, Pentaho | 3

Loops in PDI

 

If only there was a Loop Component in PDI *sigh*.

But we can achieve Looping Easily with the Help of few PDI Components. Let’s take a requirement of having to send mails. Now if the mails id and files to be sent where fixed we could just use a property file and send the mail id’s as a parameter.

but here mail ids will be coming in from a Table source and then mail id’s can never be the same.
So how did i approach this?

Make data easy with Helical Insight.
Helical Insight is world's best open source business intelligence tool.
Click Here to Free Download

let’s say mail id’s are from Table: mail_address so now my aim here is to first add a row number to each record line and a Maxcount of all the records at each line, *Keep In Mind of the Maxcount*

and then pass it as a filter clause

where row_number = 1

now my assumption is the query should execute and return one row and then the column values like sender mail, receiver mail will be set into a variable and then passed to a Mail Transformation Component;

But how can we increase the value of the Where clause i.e how can we make

where row_number = 1 to

where row_number = 2

where row_number = 3 and so on.

 

So here is were Looping is required tp we take a variable and increment the value.

resultop0
1. Take a Set Variable at the Job Level [Initialize Loop] and then set a variable loop and assign the value to your initial value as shown below: In My case loop value = 1

resultop

 

Now next take a Transformation to get the variables and set the variables as shown below:

resultop1Now in my scenario I have a Modified JS because few conversions were required hence after that set the variable.

Next We need Transformation where we pass our variable of loop into our Query
as shown below

resultop3

 

SELECT * FROM (
SELECT *,ROW_NUMBER() OVER (Order by createfolderpath) as row_num,COUNT(dummy)           OVER (partition by dummy) as maxcount FROM (SELECT * FROM
(
SELECT DISTINCT replace(createfolderpath ,’/’,’\’) as createfolderpath,1 as dummy
FROM reportpaths
)base
)base2
Cross JOIN [TEST_SLSDMBKP].[dbo].[job_mailing_address]
WHERE
scheduled_flag = ‘Y’
)base3
WHERE
row_num = ${loop}
ORDER BY row_num,receivers_email,createfolderpath

Now here in my table input, 1 row will be the output so hence I pass the required columns into the Modified JS and then Set them as Variables.

resultop5

Next at our Job Level we use a JavaScript Component to increment the loop value to 2 as shown above.

Next at our Job Level we add a new transformation which does our requirement of Sending Mails as shown below:

resultop6

And we pass all the required variables into the Mail Component as shown below:resultop8

RECEVIERSEMAIL and all are values from our DB we Set as Variables.

After that we go to our JOB Level and we use a Simple Evaluation Component

Make data easy with Helical Insight.
Helical Insight is world's best open source business intelligence tool.
Click Here to Free Download

To stop the Loop after a particular condition

resultop10

Now ${MAXCNT} is the  Maxcount of all the records at each line from our Query we sent as variable

Now once loop number is 6 and Maxcount is  5 then it stops the loop and ends else

It continues and goes to the next Component which is Wait Component which in a previous blog it was specified the importance of a WAIT Component.

So that’s it we went through looping in PDI.

logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

Hence if we have any other approach, please feel free to drop it in the comment below

Thanks
Sohail Izebhijie

5 1 vote
Article Rating
Subscribe
Notify of
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Im not sure I follow as there technically is a loop ability in PDI. You can create a sub transformation within a job that will return a resultset (the records you intend to use for the looping variables) and then you can copy that resultset from the sub transformation to the parent job. Once those rows are floating about the parent job you can tell the subsequent transformation (and any other sun transformations and jobs) to execute for each input row it receives thus creating a loop. Maybe im missing something here?

You’re absolutely right Harris.

hello, if i get rest client for get API and API limits how to looping rest client for get all record API?