Wednesday, August 8, 2012

SSIS: How to read nth Row from Flat File?

Reading a specific or nth row from a source file is very common task that a SSIS developer come across. When it comes to reading nth row from a SQL or Oracle or any DBMS or RDMS database we can use many methods such as applying Row_Number function or creating stored procedure which will return specific rows.

With Ms-Excel sources file SSIS provides OpenRowSet which makes the reading the nth row from Excel simple. SSIS: Read and Export Excel data from nth Row

In case you are dealing with Flat file and want to read specific row or nth row, you do not have a direct solution available into SSIS. You have to customize a solution for this. I came across this question in many SSIS forums and this post describles one of the simple way to read nth row from Flat File.

Consider an example of a Flat file with followign data into it. We want to read 2nd row data only from flat file.

Let us create a SQL table which will store the data which we read from Flat file. The SQL table tblProducts have two columsn ProductName and Quantity.

CREATE TABLE [dbo].[tblProducts](
    [ProductName] [varchar](50) NULL,
    [Quantity] [int] NULL
) ON [PRIMARY]

Let us design the SSIS package for this. We have one Data Flow task in our SSIS package. On Data Flow tab we have three components.

Flat File Source: This is the flat file source which will read the entire data from flat file.

Script Component: The script component contains the code logic to differentiate the nth row data from the total data read from the Flat file.

OLE DB Destination: The component stores the data into SQL table tblProducts.

We have created two data flow level variables. StartRow and EndRow. StartRow contains the row number from where you want to start reading data. EndRow contains the last row number you want to read data.

In the Script Component we have created an output called MyOutPut. This output has two columns outProductName and outQuantity. We want to add nth data to our output. We have setup the SynchronousInputID property value to None.

We are reading the two variables we created inside the Script Component.

We have following code logic inside Script Component which will check if the rows number falls between StartRow and EndRow, the row data will be added into output MyOutPut.

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    int intRowCount=1;

    public override void PreExecute()
    {
        base.PreExecute();      
    }

    public override void PostExecute()
    {
        base.PostExecute();        
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        
        if ((intRowCount >= Variables.StartRow) && (intRowCount <= Variables.EndRow))
        {
            MyOutputBuffer.AddRow();
            MyOutputBuffer.outProductName= Row.productname;
            MyOutputBuffer.outQuantity = Convert.ToInt32(Row.quantity);            
        }

        intRowCount = intRowCount + 1;
        
        
    }

}

After setting up the SSIS package, I run the package and as expected only one row i.e. 2nd row was added into table tblProducts.

Now let us read data from flat file starting from 3rd row to 5th row. I have changed the StartRow to 3 and EndRow to 5. The resutls were expected. The SSIS package has read the nth row that we specify.

So our objective to read nth row from flat file is achieved. Thanks for reading till this point.

Popular Posts

Real Time Web Analytics