Merge Join Transformation in SSIS packages

Posted on by By admin, in AWS, Big Data, Business Intelligence | 0

In this article, we will explore the Merge Join Transformation in SSIS package. It is useful to perform SQL Joins using the SSIS package. We can achieve Inner Join, Left Outer Join and Full Outer Join using this transformation.

Sample data – Flat file source

I prepared two flat files (*.txt), as shown below. These files contain user information and school details. We have school_id field common in both files.

Merge Join Transformation in SSIS packages

Create an ETL package for Merge Join Transformation in SSIS:

Launch Visual Studio, create a new integration service project.On the next page, specify a project name and directory to save the package related files.It opens the new project and solution, as shown below.

Merge Join Transformation in SSIS packages

Now, drag a data flow task from the SSIS toolbox and rename it. For rename, right-click on it and choose Rename.

Merge Join Transformation in SSIS packages

Double-click on this task and add two flat file sources in the data flow area. We use the flat file source for connecting with text or CSV files.Rename both flat-file sources to reflect the flat file source.

Merge Join Transformation in SSIS packages

We see a red cross on both flat files because we have not created connections to source files. Let’s create connections for flat files.

Double-click on this flat file source, and it opens the following flat file source editor.

Merge Join Transformation in SSIS packages

Click on New and fill out the following information’s in the General tab.

• Connection Manager name: SSIS package automatically takes a name for it, but it is recommended to give a proper name so that you can quickly identify connections in the package
• File Name: Specify the complete path of the source file. You can browse the file here or paste the complete file path
SSIS package automatically checks the format and text delimiter for the flat file. You can review and make changes if required.

Merge Join Transformation in SSIS packages

Navigate to Columns, verify the columns present in the flat file in preview.

Navigate to Advanced and verify the properties of the column. Here, you can make changes in column delimiter, data type, and output column width.

Navigate to Preview and verify records appearing correctly.

Click OK, and we are done with the configuration for the first flat file source.

Before we add a Merge Join Transformation in SSIS, we require a sort transformation with both flat file source.The Merge Join Transformation requires sorted data for its inputs. For more information about this important requirement, see Sort Data for the Merge and Merge Join Transformations.

Drag two sort transformations and join them with the flat file source shown below.

Merge Join Transformation in SSIS packages

We need to configure the sort transformation as well. Let’s open the sort connected with [user_details] flat file source.It shows the input columns from the flat file. Select the column to join with another flat file. Here, I selected school_id because we have school_id column in both flat files.

Merge Join Transformation in SSIS packages

Click Ok, and the configuration is done for the first sort operator. Similarly, we need to do for second sort operator.

Now, it’s time to add Merge Join transformation in SSIS package. Darg and place Merge Join below sort components.

Once you drag an output from sort operator to Merge Join, it opens the following pop-up for input-output selection. Here, we select the input either Merge Join Left Input or Merge Join Right Input.

Merge Join Transformation in SSIS packages

Let’s select the Merge Join Left Input and click Ok. Similarly, drag the output of the second sort to Merge Join transformation. It does not show an input-output selection window this time because we have already given input in the first sort transformation.

Now, we need to configure Merge Join as well. Double-click on it and open Merge Join Transformation Editor. It shows both input files data, and join type as Inner Join in this case.

Merge Join Transformation in SSIS packages

Now, select the columns we require in the output. We have a common field [school_id] in both data set. We can skip one column to avoid duplicate columns.

Merge Join Transformation in SSIS packages

Click Ok, and we can see that configuration is complete for the Merge Join Transformation in SSIS.
For this demo purpose, we do not need to insert data in a SQL table. My destination is database in SQL Server for that I have add OLE DB Destination component

To view data, right-click on the connection line between both Merge Join and OLE DB Destination and click on Enable Data Viewer. You can see a data viewer symbol on the connection line.Let’s execute the SSIS package now and view the results.

Merge Join Transformation in SSIS packages

Conclusion:

We explored the usage of Merge Join Transformation in SSIS packages to perform JOINS. You can also use it in ETL’s where you get data from multiple sources, and you need to perform the Join on them. It eliminates the need for importing data first in SQL tables, perform joins using SQL query and then do further data processing.

Thank You
Prashanth Kanna
Helical IT Solutions

logo

Best Open Source Business Intelligence Software Helical Insight is Here

logo

A Business Intelligence Framework

0 0 votes
Article Rating
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments