Comparing Tables Schemas with Pentaho Data Integration

Posted on by By Sohail, in Pentaho | 1

Most times our source Tables differs from our Destination Table, but in other times it doesn’t.In a scenario where our source table is the same as our destination, then we might come into a situation where our source table could be changed in terms of data type or length or even a new column added.

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

So hence this blogs shows how we can make life easy by having a simple Job that runs in a loop on all the similar tables in both source and destination, to check if there is a change in the schema

Here we are using simple Job to perform the comparison of the Source Table with the destination table.

simple Job to perform

In the GetSourceSchema:

This uses a table input to run a Query Limit 1 i.e

SELECT  * from person_employment,

Note: Here person_employment can be parameterized such that we can run the tables comparison in Loop and the name person_employment can be used as the Schema File Name (as we go further you will get a clear picture).

GetSourceSchema

And then we use a Metadata structure of stream which takes the metadata of the Table and prints it in a text file which we will need for comparison

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Metadata structure of stream

So from the Metadata Stream, we get a lot of Fields but the one I needed to compare is just the field name because I assume the source and destination field name are the same, other fields are Type and Length.

The same goes for GetDestinationSchema

GetDestinationSchema

Parametrizing the tables is a good idea so we can save the Text files with a unique name each time. And then pass/use the parameterized text name to compare.

Text files with a unique name

If they match you can change the direction on what you want to do next if it fails you can send a mail to saying hey this table doesn’t match.

Final Result

So a lot we can do, to loop you can reference: Loops in Pentaho Data Integration or Loops Pentaho Data Integration 2.0. This whole Pentaho Job is just a simple raw Job, which can be tailored to how you want it.

Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.

Claim Your 30 Days Free Trail

logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

In case if you have any queries please get us at support@helicaltech.com

Thank You
Sohail Izebhijie
Helical IT Solutions Pvt Ltd

0 0 votes
Article Rating
Subscribe
Notify of
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Hi. Thanks for sharing the valuable information. I found this article very helpful. Would love to read more from you. Keep up the good work.
I have read a similar article on Pentaho vs. Talend: How the Two Data Integration Tools Compare?. Please do check it out.