MERGE, MERGE JOIN and UNION ALL

The blog is moved to

  1. www.aboutsql.in
  2. OR
  3. Lets Learn SSIS

Well its looks I really getting fond of writing blogs. I think everyone should at least give a try writing blogs once 🙂

Today I am going to study some of the Dataflow transformations, MERGE, MERGE JOIN, and UNION ALL.

Yes the first question any SQL SERVER developer can ask is, these operations can very well be done in T-SQL query, Why using it in SSIS?, well the answer is, T-SQL can be used only when you are working with data which resides in SQL SEVER Tables, but what if the data is coming from different database itself like ORACLE or MYSQL or Flat File or XML or EXCEL.

Let’s take it one by one each transformation, the post also covers differences, which are actually common questions, in any SSIS interviews.

Rather than reinventing the wheel, I am taking the text from MSDN for providing the description of each transformation which I found covering everything J

MERGE:

The Merge transformation combines two sorted datasets into a single dataset. The rows from each dataset are inserted into the output based on values in their key columns.

By including the Merge transformation in a data flow, you can perform the following tasks:

  • Merge data from two data sources, such as tables and files.
  • Create complex datasets by nesting Merge transformations.
  • Remerge rows after correcting errors in the data.

The Merge transformation is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in the following situations:

  • The transformation inputs are not sorted.
  • The combined output does not need to be sorted.
  • The transformation has more than two inputs.

Well here is the example..here we are taking input from two different sql server databases and merging them into a flatfile.
MergeExample
And here is what we need to set in Merge Transformation properties
MergeExample
This transformation is pretty simple and straight forward.
//
//
MERGE JOIN: The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join. For example, you can use a LEFT join to join a table that includes product information with a table that lists the country/region in which a product was manufactured. The result is a table that lists all products and their country/region of origin. For more information, see Using Joins.

You can configure the Merge Join transformation in the following ways:

  • Specify the join is a FULL, LEFT, or INNER join.
  • Specify the columns the join uses.
  • Specify whether the transformation handles null values as equal to other nulls.
Note
If null values are not treated as equal values, the transformation handles null values like the SQL Server Database Engine does.

This transformation has two inputs and one output. It does not support an error output.

Here is the same example with merge join..
MergeJoinExample
And the properties window looks like this, here we need to set which type of Join we need to perform the dataset(Left outer, right outer or inner) it’s the same what we use in any RDBMS.
MergeJoinExample
UNION ALL:

The Union All transformation combines multiple inputs into one output. For example, the outputs from five different Flat File sources can be inputs to the Union All transformation and combined into one output.

The transformation inputs are added to the transformation output one after the other; no reordering of rows occurs. If the package requires a sorted output, you should use the Merge transformation instead of the Union All transformation.

The first input that you connect to the Union All transformation is the input from which the transformation creates the transformation output. The columns in the inputs you subsequently connect to the transformation are mapped to the columns in the transformation output.

To merge inputs, you map columns in the inputs to columns in the output. A column from at least one input must be mapped to each output column. The mapping between two columns requires that the metadata of the columns match. For example, the mapped columns must have the same data type.

If the mapped columns contain string data and the output column is shorter in length than the input column, the output column is automatically increased in length to contain the input column. Input columns that are not mapped to output columns are set to null values in the output columns.

This transformation has multiple inputs and one output. It does not support an error output.
UnionALLExample
UnionALLExample
Even though I have taken all sources as OLEBD, you can take any type of source and output data to any type of destination.

Well guys that’s it for today, hope you guys have learned something.

–The Learner ..!!

Advertisements

Posted on July 9, 2011, in Dataflow and tagged , , , . Bookmark the permalink. 2 Comments.

  1. Nice article. Very helpful.
    Thank you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: