ETL in the Cloud with Informatica: Part 3 – Sending Dynamics CRM Online Data to Local Database

In Part 1 and Part 2 of this series, I’ve taken a look at doing Extract-Transform-Load (ETL) operations using the Informatica Cloud. This platform looks like a great choice for bulk movement of data between cloud or on-premises systems. So far we’ve seen how to move data from on-premises to the cloud, and then between clouds. In this post, I’ll show you how you can transfer data from a cloud application (Dynamics CRM Online) to a SQL Server database running onsite.

As a reminder, in this four-part blog series, I am walking through the following scenarios:

Scenario Summary

For this demo, I’ll be building a solution that looks like this:

2012.03.26informatica29

For this case, I (1) build the ETL package using the Informatica Cloud’s web-based designer, (2) the Cloud Secure Agent retrieves the ETL details when the task is triggered, (3) the data is retrieved from Dynamics CRM Online, and (4) the data is loaded into a SQL Server database.

You can probably think of many scenarios where this situation will apply. For example, good practices for cloud applications often state that you keep onsite backups of your data. This is one way to do that on a daily schedule. In another case, you may have very complex reporting needs and cannot accomplish them using Dynamic CRM Online’s built in reporting capability, so a local, transformed replica makes sense.

Let’s see how to make this happen.

Setting up the Target Database

First up, I created a database table in my SQL Server 2008 R2 instance. This table, called CrmAccount holds a few of the attributes that reside in the Dynamics CRM Online “Account” entity.

2012.03.26informatica30

Next, I added a new Login to my Instance and switched my server to accept both Windows Authentication *and* SQL Server authentication. Why? During some trial runs with this, I couldn’t seem to get integrated authentication to work in the Informatica Cloud designer. When I switched to a local DB account, the connection worked fine.

After this, I confirmed that I had TCP/IP enabled since the Cloud Secure Agent uses this port for connecting to my server.

2012.03.26informatica31

Building the ETL Package

With all that set up, now we can build our ETL task in the Informatica Cloud environment. The first step in the Data Synchronization wizard is to provide a name for my task and choose the type of operation (e.g. Insert, Update, Upsert, Delete).

2012.03.26informatica32

Next, I’ll chose my Source. In this step, I reused the Dynamics CRM Online connection that I created in the first post of the series. After choosing that connection, I selected the Account entity as my Source Object. A preview of the data was then automatically shown.

2012.03.26informatica33

With my source in place, I moved on to define my target. In this case, my target is going to involve a new SQL Server connection. To create this connection, I supplied the name of my server, instance (if applicable), database, credentials (for the SQL Server login account) and port number.

2012.03.26informatica34

Once I defined the connection, the drop down list (Target Object) was auto-populated with the tables in my database. I selected CrmAccount and saw a preview of my (empty) table.

2012.03.26informatica35

On the next wizard page, I decided to not apply any filters on the Dynamics CRM Online data. So, ALL accounts should be copied over to my database table. I was now ready for the data mapping exercise. The following wizard page let me drag-and-drop fields from the source (Dynamics CRM Online) to the target (SQL Server 2008 R2).

2012.03.26informatica36

On the last page of the wizard, I chose to NOT run this task on a schedule. I could set this run every five minutes, or once a week. There’s lots of flexibility in this.

Testing the ETL

Let’s test this out. In my list of Data Synchronization Tasks I can see the tasks from the last two posts, and a new tasks representing what we created above.

2012.03.26informatica37

By clicking the green Run Now button, I can kick off this ETL. As an aside, the Informatica Cloud exposes a REST API where among other things, you can make a web request that kicks off a task on demand. That’s a neat feature that can come in handy if you have an ETL that runs infrequently, but a need arises for it to run RIGHT NOW. In this case, I’m going with the Run Now button.

To compare results, I have 14 account records in my Dynamics CRM Online organization.

2012.03.26informatica38

I can see in my Informatica Cloud Activity Log that the ETL task completed and 14 records moved over.

2012.03.26informatica39

To be sure, I jumped back to my SQL Server database and checked out my table.

2012.03.26informatica40

As I expected,  I can see 14 new records in my table. Success!

Summary

Sending data from a cloud application to an on-premises database is a realistic use case and hopefully this demo showed how easily it can be accomplished with the Informatica Cloud. The database connection is relatively straightforward and the data mapping tool should satisfy most ETL needs.

In the next post of this series, I’ll show you how to send data, in real-time, from Salesforce.com to a SQL Server database.

Author: Richard Seroter

Richard Seroter is Director of Developer Relations and Outbound Product Management at Google Cloud. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Director of Developer Relations and Outbound Product Management, Richard leads an organization of Google Cloud developer advocates, engineers, platform builders, and outbound product managers that help customers find success in their cloud journey. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.

8 thoughts

  1. This is really interesting. In your experience are there any constraints on the amount of data that can be extracted from (or written to) Dynamics CRM online? e.g. extracting data from dynamics online into a datawarehouse

    1. It’s definitely neat technology. I haven’t seen any specific limits. The point of the software is to move a lot of data, so the only constraints may be bandwidth!

      1. 14 records is one thing, millions are quite another. I’ve seen too many projects get into trouble over the years with assumptions.

        Your bottleneck will be crm online as it throttles connections so they can’t utilize significant resources. Remember you’re in shared environment. My tests show it will take literally days just to do the initial customer extract. And you CANNOT rely on timestamps for incremental extracts (Microsoft said this at conference), instead you should be using CRM CDC (you turn that on at the entity level within CRM) but this requires an adapter that can utilize the CDC feature and there’s only one adapter I know of on the market today that supports this and that’s Kingswaysoft CRM adapter.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.