Loading Data from Source to Target
Before we proceed it’s important to identify the tool you might need to accomplish the process of ETL, in my case i would be using the Pentaho Data Integration Application (keep in mind irrespective of the tool you use the steps or procedures are similar but the approach might be differ).
The following Steps can be followed:
1. Identify your Source, it could be the following:
a. C.S.V file
b. Text file
c. A database
d. and so on
in my scenario a C.S.V (comma separated file)file.
2. Open Up your Spoon.bat
Select a new transformation and select a “Input” then select what type of input you require
we have Text File Input, C.S.V file input, SAS Input, Table Input and so on. In My case since i’ll be using C.S.V file as a Source i’ll select C.S.V file Input Component.
3. Set Up your Connection Based on your preferred connection type in my case i’ll be using the postgreSQL.
[Read my next Blog on setting Up a Connection using Connection type: PostgreSQL]
4. Once Connection has been Established you can Right-Click on the Connection and Select Share if that’s a common Connection to all your transformations will be using this will share the Transformation Connection to other transformation.
5. So we will be Sending data from Source to a Target to we need to to have a “Input” as the source
and an “Output” as the Target.
6. Input
download a C.S.V file from the internet
or even Create a TXT/C.S.V input file
as shown below.
Create a source if required
Text_Source (comma delimited)
Employee_Number,First_Name, Last_Name, Middle_Initial, Age, Gender, Title
101,Ehizogie,Izebhijie,Sohail,24,Male,Developer
102,Fahad,Anjum,WithClause,23,Male,Developer
103,Gayatri,Sharma,A,24,Female,Accountant
Save as txt or csv and this can be your input.
Here since our input is from a csv file
we open Up or C.S.V File Input component
Step Name: Anyname
File Name: Browse the selected path
Delimiter: , (comma)
Enclosure: ”
and then Select Get Fields and Select OK
Preview your Data
7. Output
Open Up Table output component
Select Target Schema
Select The Target Table
[Keeping in mind a Table exist in the DB]
Select OK!
Right-Click on the table output to MAP the columns from Source to Target
Now this is important as the to get the right data from the source to the column in the target
and Then Run
As a Beginner keep in mind that
Errors are a bound to occur
Such as Type of data from
Source Does not Match your Target Table format.
and so on.
Here a we can Have some little transformation step to convert and take care of such format errors
[In my next blog we can look into handling that]
Now Go to your Target Database and Run the SELECT * FROM table_name
there you go!
Best Open Source Business Intelligence Software Helical Insight is Here
A Business Intelligence Framework
Thanks
Sohail Izebhijie