Data Cloning Through Pentaho

Posted on by By admin, in Pentaho | 0

In this article we are going to see how to clone input data in Pentaho. The Pentaho is a popular open-source platform for extracting, transforming and loading (ETL) data.

What is CLONE Component?

The Clone component in Pentaho is a transformation step that allows us to create multiple copies, or clones, of input rows. Each clone represents an independent copy of the input data, which can be processed or modified separately.

Here are the steps:

STEP1:
Open Pentaho Data Integration tool and create new transformation.(Go to file then new and click on transformation).

STEP2:
Create a sample data using data grid component as shown in the below screenshot.

Data Cloning Through Pentaho

In this example we are going to duplicate the rows by 3 whose salary is greater than 10000.
STEP3:
Take modified java script component to define a number to the clone step based on condition. Connect data grid and Java script components, and Open the java script component and write the simple “if else” logic to define a number.

Data Cloning Through Pentaho

STEP4:
Search for “CLONE ROW” component in Design tab and drag and drop onto the canvas in the pdi workspace and connect java component to it as shown in the below screenshot and add the text file output component at the end to see the result.

Data Cloning Through Pentaho

STEP 5:
Double click on Clone row component and add the no of rows to be added based on condition.

Data Cloning Through Pentaho

Check the “Nr clone in field” box and select the field as we are giving number through field as shown in the above screenshot.
Check the 2 boxes which are under “output fields”, If we want to see the no of rows and flags to find out which one is original row.
Give the output filename in the “text output” component and click on get fields.

Data Cloning Through Pentaho

STEP 6:
Run the transformation and check the output file.

Data Cloning Through Pentaho

As we expected the rows with greater than 10000 in salary field, are repeated 3 times.
Conclusion:
The Clone component in Pentaho is a versatile tool that allows users to duplicate data rows. By leveraging the features and benefits of the Clone component, data integration and transformation workflows can be streamlined, leading to enhanced performance and resource optimization.

Thank You
Vani Bolle
Helical IT Solutions

logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments