Thursday, July 5, 2012

SSIS: How to load data into SQL table from Google Spreadsheets?

SSIS (SQL Server Integration Services) is well known for extracting data from a variety of sources. It can extract data from online sources as well. Google Spreadsheet is one of the online source which stores data online. In this post, we will learn to extract data from Google Spreadsheet using our favourite tool – SSIS.

Pre-requiste:

Before we talk about SSIS package, there are few concepts that we need understand.

#01. To use Google Spreadsheet data you must have a Google account. With a Google account, you can access Google Drive – the place where Gooogle stores the Google Spreadsheets for you.

#02. To connet and interact with Google Spreadsheet data you need, Google Data API SDK. The Google Data API are available in Java, .NET, Python and PHP. The Google Data API are interface between Google online services (ex. Google Spreadsheet, Picasa, Googel Analytics services etc.) and their calling program (client program written in Java, php, python and .NET ). SSIS being a Microsot products supoorts .NET, so you need to download .NET version of Google Data API. This is available at http://code.google.com/p/google-gdata/downloads/list

#03. Once you have downloaded and installed Google Data API SDK, you need to add Google API DLLs into Global Assembly Cache (GAC). To do this, you need to run Visual Studio Command Prompt as adminstrator. You can use gacutil, a commnd line utility to add the DLLS into GAC. Examples are following in the screen shot.

Now after completing the three pre-requisite, you are all set to start with Google Spreadsheet and SSIS. For the demonstration purpose, I have following Google Spreadsheet available into my Google Drive

I have createad following table in my local SQL Database.?

CREATE TABLE [dbo].[tblProducts](
    [ProductName] [varchar](50) NULL,
    [Quantity] [numeric](18, 0) NULL
) ON [PRIMARY]

This table will store the product details that we extract from Google Spreadsheet “productDetails”.

The next thing is to launch the SSIS design Tool — Microsoft Business Intelligence or SQL Server Data Tool (SSDT), whatever you are using at present.In my case I used MSBI. I have created a SSIS package with an OLEDB Connection Manager. The OLEDB Connection Manager is pointing to my local SQL database, which is hosting table "tblProducts". On the Control Flow Tab, I have a Data Flow task. On Data Flow tab, I have one script component and one OLE DB Destination control.

I have defined following two data flow level variables – GmailUserID and GmailPassword. To connect to Google Spreadsheets which are located in Google Drive, we need to identify our self with our Google account credentials.

The script component we are using in data flow tab is a Source Type. We are using two read only variables in Script component.

We have added two string type Output columns "strProduct" and "intQuantity" into Script component.

Under Script component the very first thing, we need to do is add references of Google Data API DLLs. These DLLs are extracted and stored in our system when we install the Google Data API SDK. On my Windows 7 Home edition system, these DLLs were stored under C:\Program Files (x86)\Google\Google Data API SDK\Redist. I added following DLLs into our script component.

Google.GData.Client
Google.GData.Extensions
Google.GData.Spreadsheets

The Script component contains following code logic to connect and read data from Google Spreadsheet.

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using Google.GData;
using Google.GData.Client;
using Google.GData.Extensions;
using Google.GData.Spreadsheets;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

  public override void PreExecute()
  {
    base.PreExecute();
  }

  public override void PostExecute()
  {
    base.PostExecute();
   
  }

  public override void CreateNewOutputRows()
  {
//Create new Spreadsheet service

    SpreadsheetsService GoogleExcelService;
    GoogleExcelService = new SpreadsheetsService("Spreadsheet-Vikash-Test-App");
//Connect with Google Spreadsheet services using Google account credentials
    GoogleExcelService.setUserCredentials(Variables.GmailUserID, Variables.GmailPassword);

//Create a new Spreadsheet Query
    SpreadsheetQuery query = new SpreadsheetQuery();
//Pass the SpreadsheetQuery to Spreadsheet Feed
    SpreadsheetFeed myFeed = GoogleExcelService.Query(query);

//The Spreadsheet Feed connects with Googel Spreadsheet services and returns feed. Each Feed represents one Google Spreadsheet.

    foreach (SpreadsheetEntry mySpread in myFeed.Entries)
    {
//Run the code logic if the Spreadsheet name is ProductDetails
      if (mySpread.Title.Text == "ProductDetails")
      {
        
        WorksheetFeed wfeed = mySpread.Worksheets;
        foreach (WorksheetEntry wsheet in wfeed.Entries)
        {
//Run the code logic if worksheet name is Sheet1
          if (wsheet.Title.Text == "Sheet1")
          {
            
            AtomLink atm = wsheet.Links.FindService(GDataSpreadsheetsNameTable.ListRel, null);

            ListQuery Lquery = new ListQuery(atm.HRef.ToString());

            ListFeed LFeed = GoogleExcelService.Query(Lquery);

//Each ListFeed represents a row so naviage through each listfeed and add them into Output Buffer.
            foreach (ListEntry LmySpread in LFeed.Entries)
            {
              Output0Buffer.AddRow();
              Output0Buffer.strProduct = LmySpread.Elements[0].Value;
              Output0Buffer.intQuantity = LmySpread.Elements[1].Value;              

              
            }
            
          }
        }
      }
    }
   
  }

}

After setting up the SSIS design components, I connected with the internet and then run the SSIS package. Since we are pulling data from an online source, we need to be connected with internet. The results were expected. The SSIS package executed successfully and pulls all the data from Google Spreadsheet “productDetails” and added them to SQL table "tblProducts".

I quickly checked the SQL table and records were in the table.

So this is how we can extract data from an online source like Google Spreadsheet. Thansk for reading this post.

If you are interested to have a hands-on on this package, you can download this SSIS pacakge from Download Zone.

Related Article:

Popular Posts

Real Time Web Analytics