In this article, we will explore the Merge Join Transformation in SSIS package. It is useful to perform SQL Joins using the SSIS package. We can achieve Inner Join, Left Outer Join and Full Outer Join using this transformation.
Sample data – Flat file source
I prepared two flat files (*.txt), as shown below. These files contain user information and school details. We have school_id field common in both files.
Create an ETL package for Merge Join Transformation in SSIS:
Launch Visual Studio, create a new integration service project.On the next page, specify a project name and directory to save the package related files.It opens the new project and solution, as shown below.
Now, drag a data flow task from the SSIS toolbox and rename it. For rename, right-click on it and choose Rename.
Double-click on this task and add two flat file sources in the data flow area. We use the flat file source for connecting with text or CSV files.Rename both flat-file sources to reflect the flat file source.
We see a red cross on both flat files because we have not created connections to source files. Let’s create connections for flat files.
Double-click on this flat file source, and it opens the following flat file source editor.
Click on New and fill out the following information’s in the General tab.
• Connection Manager name: SSIS package automatically takes a name for it, but it is recommended to give a proper name so that you can quickly identify connections in the package
• File Name: Specify the complete path of the source file. You can browse the file here or paste the complete file path
SSIS package automatically checks the format and text delimiter for the flat file. You can review and make changes if required.
Navigate to Columns, verify the columns present in the flat file in preview.
Navigate to Advanced and verify the properties of the column. Here, you can make changes in column delimiter, data type, and output column width.
Navigate to Preview and verify records appearing correctly.
Click OK, and we are done with the configuration for the first flat file source.
Before we add a Merge Join Transformation in SSIS, we require a sort transformation with both flat file source.The Merge Join Transformation requires sorted data for its inputs. For more information about this important requirement, see Sort Data for the Merge and Merge Join Transformations.
Drag two sort transformations and join them with the flat file source shown below.
We need to configure the sort transformation as well. Let’s open the sort connected with [user_details] flat file source.It shows the input columns from the flat file. Select the column to join with another flat file. Here, I selected school_id because we have school_id column in both flat files.
Click Ok, and the configuration is done for the first sort operator. Similarly, we need to do for second sort operator.
Now, it’s time to add Merge Join transformation in SSIS package. Darg and place Merge Join below sort components.
Once you drag an output from sort operator to Merge Join, it opens the following pop-up for input-output selection. Here, we select the input either Merge Join Left Input or Merge Join Right Input.
Let’s select the Merge Join Left Input and click Ok. Similarly, drag the output of the second sort to Merge Join transformation. It does not show an input-output selection window this time because we have already given input in the first sort transformation.
Now, we need to configure Merge Join as well. Double-click on it and open Merge Join Transformation Editor. It shows both input files data, and join type as Inner Join in this case.
Now, select the columns we require in the output. We have a common field [school_id] in both data set. We can skip one column to avoid duplicate columns.
Click Ok, and we can see that configuration is complete for the Merge Join Transformation in SSIS.
For this demo purpose, we do not need to insert data in a SQL table. My destination is database in SQL Server for that I have add OLE DB Destination component
To view data, right-click on the connection line between both Merge Join and OLE DB Destination and click on Enable Data Viewer. You can see a data viewer symbol on the connection line.Let’s execute the SSIS package now and view the results.
Conclusion:
We explored the usage of Merge Join Transformation in SSIS packages to perform JOINS. You can also use it in ETL’s where you get data from multiple sources, and you need to perform the Join on them. It eliminates the need for importing data first in SQL tables, perform joins using SQL query and then do further data processing.
Thank You
Prashanth Kanna
Helical IT Solutions
Best Open Source Business Intelligence Software Helical Insight is Here