Author Archives: Kutub
Please catch me on
sorry for inconvience..
The blog is moved to
Well its looks I really getting fond of writing blogs. I think everyone should at least give a try writing blogs once 🙂
Today I am going to study some of the Dataflow transformations, MERGE, MERGE JOIN, and UNION ALL.
Yes the first question any SQL SERVER developer can ask is, these operations can very well be done in T-SQL query, Why using it in SSIS?, well the answer is, T-SQL can be used only when you are working with data which resides in SQL SEVER Tables, but what if the data is coming from different database itself like ORACLE or MYSQL or Flat File or XML or EXCEL.
Let’s take it one by one each transformation, the post also covers differences, which are actually common questions, in any SSIS interviews.
Rather than reinventing the wheel, I am taking the text from MSDN for providing the description of each transformation which I found covering everything J
The Merge transformation combines two sorted datasets into a single dataset. The rows from each dataset are inserted into the output based on values in their key columns.
By including the Merge transformation in a data flow, you can perform the following tasks:
- Merge data from two data sources, such as tables and files.
- Create complex datasets by nesting Merge transformations.
- Remerge rows after correcting errors in the data.
The Merge transformation is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in the following situations:
- The transformation inputs are not sorted.
- The combined output does not need to be sorted.
- The transformation has more than two inputs.
Well here is the example..here we are taking input from two different sql server databases and merging them into a flatfile.
And here is what we need to set in Merge Transformation properties
This transformation is pretty simple and straight forward.
MERGE JOIN: The Merge Join transformation provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER join. For example, you can use a LEFT join to join a table that includes product information with a table that lists the country/region in which a product was manufactured. The result is a table that lists all products and their country/region of origin. For more information, see Using Joins.
You can configure the Merge Join transformation in the following ways:
- Specify the join is a FULL, LEFT, or INNER join.
- Specify the columns the join uses.
- Specify whether the transformation handles null values as equal to other nulls.
|If null values are not treated as equal values, the transformation handles null values like the SQL Server Database Engine does.|
This transformation has two inputs and one output. It does not support an error output.
Here is the same example with merge join..
And the properties window looks like this, here we need to set which type of Join we need to perform the dataset(Left outer, right outer or inner) it’s the same what we use in any RDBMS.
The Union All transformation combines multiple inputs into one output. For example, the outputs from five different Flat File sources can be inputs to the Union All transformation and combined into one output.
The transformation inputs are added to the transformation output one after the other; no reordering of rows occurs. If the package requires a sorted output, you should use the Merge transformation instead of the Union All transformation.
The first input that you connect to the Union All transformation is the input from which the transformation creates the transformation output. The columns in the inputs you subsequently connect to the transformation are mapped to the columns in the transformation output.
To merge inputs, you map columns in the inputs to columns in the output. A column from at least one input must be mapped to each output column. The mapping between two columns requires that the metadata of the columns match. For example, the mapped columns must have the same data type.
If the mapped columns contain string data and the output column is shorter in length than the input column, the output column is automatically increased in length to contain the input column. Input columns that are not mapped to output columns are set to null values in the output columns.
This transformation has multiple inputs and one output. It does not support an error output.
Even though I have taken all sources as OLEBD, you can take any type of source and output data to any type of destination.
Well guys that’s it for today, hope you guys have learned something.
–The Learner ..!!
After lots of failure attempts finally I am confident enough to write my first post. It took like 3 years for me to start writing this first post. And you know what, its not that simple!!.
Anyways feelings aside, and lets starts something which is of our interest.
Yeah we gona start with Loops in SSIS, well there are two types of loops available in SSIS 2005/2008
- FOR Loop : Use when you know the count of iteration of the loop
- FOREACH Loop : Use when you don’t know the count of Iteration of loop
Now I am not going to put much text to read, or give you detailed information on architecture of loops or SSIS, rather we will be learning here by examples.
So lets start with For Loops: <I will cover ForEach loop in my next article>
Lets say the requirement is everytime I execute the loop in my ssis package, three records will be inserted into a table named LoopsExample. So we will start with
1. Create a table named LoopsExample as shown
Create table LoopsExample( Pkey int identity(1,1), IterationCount int, DateLooped datetime)
Whenever the SSIS loop executes, it inserts three new records into the LoopsExample table with PKey as primary key, IterationCount as iteration number just to know how the iteration values gets change, and DateLoooped will show the date the record is inserted into the table
2. So here what we do in SSIS designer
- Create a package variable named @i
- Drop a For Loop container into the SSIS Control flow and double click to open For loop Editor.
- Drop Execute SQL Task into For Loop container.
- Configure Execute SQL task.
- Set Parameter Mapping tab.
- Here is the query to use in SQL Statement in Execute SQL task
Insert into LoopsExample (IterationCount,DateLooped)
. That’s it, on executing the package we get following output ..
As Expected 🙂 and I hope you have learned some thing out of this..
Here are some suggestions for your first post.
- You can find new ideas for what to blog about by reading the Daily Post.
- Add PressThis to your browser. It creates a new blog post for you about any interesting page you read on the web.
- Make some changes to this page, and then hit preview on the right. You can alway preview any post or edit you before you share it to the world.