Remove Duplicate Values using Pentaho Kettle PDI ETL Tool

Posted on by By Nikhilesh, in Business Intelligence, Data Visualization, ETL, Open Source Business Intelligence, Pentaho | 0

How to avoid duplication of values in the columns in PDI if column names are same?

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Get your 30 Days Trail Version

There can be requirements where we want same column names in the transformation (PDI). In such case, the value of field is overwritten on another field.

For ex: If for a company we want 3 pairs of Loan Id and Loan Amounts , if we give the same name for the columns as per requirement then there will be duplication of data. If we don’t handle this scenario properly this can happen.

Suppose: We have 3 Loan Amounts.

Loan Amount 1: 10$

Loan Amount 2: 20$

Loan Amount 3: 30$

The column names are same i.e Loan Amount, Loan Amount, Loan Amount.If we directly pass the names in the text output then there will be replication , all three amounts will be coming 10$.

In this case we cannot using Select Values step (Renaming) will not solve the replication issue.

To get rid of such situation we can create a separate header transformation and check append in text file output  in the next transformation.

In this case we can keep the name of column like “Loan Amount” for n no of times in the header and for the differentiation purpose in the next step we can keep Loan Amount 1, LoanAmount 2, Loan Amount 3.

Note: For checking append in the text file output follow these steps:

  • Double click on text file output
  • Go to content
  • Check append and uncheck header

Thanks

logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments