Metadata Injection in Pentaho Data Integration
The ETL Metadata Injection step inserts metadata into a template transformation. To Explain further, Let’s have a simple scenario
of Loading CSV Data into a Table,
Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.
So when using PDI, we use a CSV input and then Statically we get the fields of the CSV and then we connect the CSV input to a Metadata
now this is ideally the step we follow now imagine we had 10 different CSV files
that will mean 10 different CSV inputs and 10 different Tables Output, now this is time consuming, Hence Metadata Injection is here to save us by
passing ETL metadata at run-time. This step enables you to solve repetitive ETL workloads like loading of text files, data migration, and so on.
Now I will show you how this is done.
First lets keep in mind that the table we are loading the csv data has already been created. this is a crucial step for us to achieve our goal.
So we are creating a table as follows:
CREATE TABLE employees
(
id integer NOT NULL,
name character varying(100),
date character varying(100),
departmentid character varying(100)
)
Now we need to create a Transformation with a csv input and table output as shown below: fetchcsvandload.ktr
with values as shown below
now once that is covered we need to create a Transformation that will have our ETL metadata injection
and few other components which will all a forward connection to the ETL metadata Injection
these components are as follows:
1. Table input and Metadata structure of stream
now this table input is required to fetch data from the table employees and then pass it through the Metadata structure of stream which catches the metadata of the output row
and uses that for our injection process.
Now within our Table input we need to have the following Query so that we can get a row which we will pass through into the next component
SELECT * FROM (
SELECT e.* FROM (SELECT 1 as one) dummy
LEFT JOIN employees e -- employees table can be parametrized In the Scenario where you want to load multiple and different CSV using MetaData Injection
on 1=1
)base
2. Get File Names
This is required to fetch the file path of the CSV
Our Transformation now looks something like this
Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.
3. Modified JS
I’m using this component here to fetch the following:
1. tableName
2. fileName
3. delimiter
4. schemaValue
as shown in the image below
4. ETL Metadata Injection
We give the path of our Transformation which has a csv input and table output: ${Internal.Transformation.Filename.Directory}/fetchcsvandload.ktr
and then we click on get Validate and Refresh
then we click on Inject Metadata Tab
Now you will get all the options from our fetchcsvandload.ktr so we fill it up with the Metadata.
now from the image above you can see we filled up the values of from the incoming transformations and its values.
Make data easy with Helical Insight.
Helical Insight is the world’s best open source business intelligence tool.
now before we run our file to see how Metadata Injection works here is how our Csv input & Table output looks like:
Best Open Source Business Intelligence Software Helical Insight is Here
A Business Intelligence Framework
Thanks
Sohail Izebhijie