We recently implemented a BizTalk design pattern where on schedule (or demand), records are retrieved from a database, debatched, returned to the MessageBox, and subscribed to by various systems.
Normally, “datastore to datastore” synchronization is a job for an ETL tool, but in our case, using our ETL platform (Informatica) wasn’t a good fit for the use case. Specifically, handling web service destinations and exceptions wasn’t robust enough, and we’d have to modify the existing ETL jobs (or create new ones) for each system who wanted the same data. We also wanted the capability for users to make “on demand” request for historical data to be targeted to their system. A message broker made sense for us.
Here are the steps I followed to create a simple prototype of our solution.
Step #1. Create trigger message/process. A control message is needed to feed into the Bus and kick off the process that retrieves data from the database. We could do straight database polling via an adapter, but we wanted more control than that. So, I utilized Greg’s great Scheduled Task Adapter which can send a message into BizTalk on a defined interval. We also have a manual channel to receive this trigger message if we wish to run an off-cycle data push.
Step #2. Create database and database schemas. I’ve got a simple test table with 30 columns of data.
I then used the Add Generated Items wizard to build a schema for that database table.
Now, because my goal is to retrieve the dataset from the database, and then debatch it, I need a representation of the *single* record. So, I created a new schema, imported the auto-generated schema, set the root node’s “type” to be of the query response record type, and set the Root Reference property.
Step #3. Build workflow (first take). For the orchestration component, I decided to start with the “simple” debatching solution, XPath. My orchestration takes in the “trigger” message, queries the database, gets the batched results, loops through and extracts each individual record, transforms the individual record to a canonical schema, and sends the message to the MessageBox using a direct-bound port. Got all that?
When debatching via XPath, I use the schema I created by importing the auto-generated SQL Server schema.
Step #4. Test “first take” workflow. After adding 1000 records to the table (remember, 30 columns each), this orchestration took about 1.5 – 2 minutes to debatch the records from the database and send each individual record to the MessageBox. Not terrible on my virtual machine. However, I was fairly confident that a pipeline-based debatching would be much more efficient.
So, to modify the artifacts above to support pipeline-based debatching, I did the following steps.
Step #1. Modify schemas. Automatic debatching requires the pipeline to process an envelope schema. So, I took my auto-generated SQL Server schema, set its Envelope property to true, and picked the response node as the body. If everything is set up right, then the result message of the pipeline debatching is that schema we built that imports the auto-generated schema.
Step #2. Modify SQL send port and orchestration message type. This is a good one. I mentioned above that you need to use the XmlReceive pipeline for the response channel in the SQL Server request-response send port. However, if I pass the response message through an XmlReceive pipeline with the chosen schema set as an “envelope”, the message will debatch BEFORE it reaches the orchestration. Then I get all sorts of type mismatch exceptions. So, what I did, was change the type of the message coming back from the request-response port to XmlDocument and switched the physical send port to to a passthrough pipeline. Using XmlDocument, any message coming back from the SQL Server send port will get routed back to the orchestration, and using the passthrough pipeline, no debatching will occur.
Step #3. Switch looping to use pipeline debatching. In BizTalk Server 2006, you can call pipelines from orchestrations. I have a variable of type Microsoft.XLANGs.Pipeline.ReceivePipelineOutputMessages, and then (within an Atomic Scope), I called the *default* XmlReceive pipeline using the following code:
Then, my loop condition is simply rcvPipeOutputMsgs.MoveNext(), and within a Construct shape, I can extract the individual, debatched message with this code:
WorkforceSingle_Output = null;
Step #4. Test “final” workflow. Using the same batch size as before (30 columns, 1000 records), it took between 29-36 seconds to debatch and return each individual message to the MessageBox. Compared to nearly 2 minutes for the XPath way, pipeline debatching is significantly more efficient.
So, using this pattern, we can easily add subscribers to these database-only entities with very little impact. One thing I didn’t show here, but in our case, I also stamp each outbound message (from the orchestration) with the “target system.” The trigger message sent from the Scheduled Task Adapter will have this field empty, but if a particular system wants a historical batch of records, we can now send an off-cycle request, and have those records only go to the Send Port owned by that “target system”. Neat stuff.
Technorati Tags: BizTalk
Pretty cool Richard. One question, though: How did biztalk behave (and how was performance) on your prototype? I haven’t had a chance to implement something like this before; but it’s something I’ve seen some of my clients ask about…
As you would expect (as during any pipeline debatching), the CPU was pegged on my box. But, the pipeline-based solution only took 30 seconds to debatch, while the xpath way took nearly 120 seconds. I watched a few of the perf counters (persistance points, etc) and the box behaved quite well.
Hi, cool. Did something similar a while ago but I never took the time write a thorough blog post as this!
However, I’d like to know why you use a orchestration at all? Couldn’t you just debatch the messages in the receive port (as you actually do firstn)? Is it because you need to set the target system on messages where the command message has such a value? If so one could imagine a solution where one only the command messages containing a target system is routed through an orchestration and where command messages without (that should be the most common case – right?) just use a send-response port and the messages are debatched in the port. I’d be cool to see what that do to the performance.
Just a thought. Nice post (as always). Thanks!
The other Richard
Hey “other Richard”,
You know, the orchestration could indeed be optional in some cases. Stamping the target system was one reason I did the orchestration, but the other reason was to prevent flooding downstream systems. This way, I do a more measured drop into the MessageBox instead of dropping hundreds/thousands of messages into the MessageBox simultaneously. True I could turn ordered-delivery on for the subscribers if they couldn’t handle the load, but this way felt less obtrusive.
Thanks for the comment.
Great post… Just a question: Since the debatching and looping through the debatched messages is done within the atomic scope, that means that all is done in memory… don’t we risk to have memory problem for large interchanges?
To my understanding, using the pipeline (vs. xpath debatching) means you’re working with streams, rather than entire DOM-loaded Xml documents in memory. So, should be less of a footprint. That said, some characters here were trying this with gigantic result sets and caused the Oracle adapter to give up!
Thanks for the reply Richard.
My thoughts are about memory problem inside the atomic scope: you have the rcvPipeOutputMsgs that receives the collection of debatched messages (at this point I don’t think they have been persisted yet in the MessageBox). Then you loop across this collection (rcvPipeOutputMsgs.MoveNext) and I assume use a Send shape do post to the MessageBox each individual (debatched) message. Now since this is an atomic scope, it is my understanding that everything is keept in memory till the atomic scope ends, as only then we have a persistent point and therefore the (batched) Send operations are indeed commited to the MessageBox and all the scope level variables (like rcvPipeOutputMsgs) get’s disposed. So during the whole loop inside the atomic scope, all the interchange is kept in memory. Do I miss something here? this is a very interesting topic.
To your point, a large result set may still cause issues with the MessageBox (due to atomic scope) and memory footprint could grow, but it still should be easier on system resources than XPath based debatching. However, I haven’t run a test of this while watching the perf counters, so I can’t say for sure. But dealing with message streams vs. the XML DOM has to help resource allocation a bit!
Would you mind explaning how I could write to the message box in sql using biztalk 2006. I have tried using the add generating wizard with stored procedures, but to no end. If you could help that would be great!! Thanks Richard.
How do you mean “write to the messagebox”? To store a message, or to send it for processing? If you want to send a message (from an orchestration) to the messagebox for others to subscribe to, then you use a “direct” binding in your orchestration port. You would never interact with the messagebox as an actual database (via SQL adapter or generated schemas). That help?
This is very interesting, but why do you have to do the debatching inside the orchestration? I’m currently facing a similar situation but I would rather want my orchestration to call a stored proc (with certain parameters), and when the response arrives, I would like it to debatch inside the receive pipeline and sent directly to the messagebox (without going back to the initial orchestration) so that another orchestration subscribing to these debatched messages can process them directly. Do you think this is possible? If it is, how would the initial orchestration look like? Specifically, how would the request-response port handle this?
Seems to be as simple as sending the sproc request message through a direct port to the message box. And then have a port subscribing to these and send the request to the SQL Server. The response is debatched inside the receive pipeline and all individual debatched messages are published to the message box for anyone to subscribe to.
Thanks for this great post! Helped me solve my problem!
Glad to help! You responded before I could.
Can you please send me the Source code of this Pattern. It would be great help for me to understand and work on this sample.
Is there something specific you are looking for that I can answer?
can you please send me the source code for the solution explained above.
Actually, I have to parse an XML.
In my case, I will receive an XML which can have multiple set of records. I have to map those records and store it in database one by one. Means, if XML is having 10 records then 10 rows has to be created in database and mapping should be done for the same.
Please help me as I am pretty much new in Biztalk.
Thanks in advance.
I don’t have the source in a packaged form, but let me know if any part of this post is unclear as you try to reconstruct the scenario.
hi, i have to debatch records from SQL and apply Mapping to each individual records to MSCRM 4.0 entity schemas. i am facing problem in the mapping part.
my problem is : i am unable to map as it is saying either no rot element or namespace mismatch. but i have checked the namespace is proper. plz let me know anything you can provide with.
sorry guys, i am disturbing you all again.
i have solved the issue : used the logic same as mentioned above in the example.
but i am concerned about performance : i have to process about 700,000 rcords from SQL and thinking to make batches of 1000 records at a time(looping for 1000 records at time since total records count is high). due to atomic scope, is it going to create any problem ?? memory problem ?? what would be the ideal case : to go with the XPath solution or the Default pipeline with atomic scope ??? please keep in mind that destination system is Microsoft CRM system.
Hey Joon, given that size, might be safer to be looking at ETL operations instead of messaging. Dynamics CRM should allow SSIS data load, right?
Very nice article as always.
I have a similar ‘fanout’ situation but with a slightly different requirement. I have to fanout the messages to hundreds of endpoints. The endpoints have request/response messaging pattern. I have used the dynamic send port to achieve that. This works fine in normal circumstances. But the issue arises when an endpoint is down. At this stage (as expected) messaging engine retries to resend the message but the whole fanout process is halted until that retry count is reached. (I have the deliver notification set to none but the request/response pattern is killing me).
One solution is to change the service endpoint to one-way. In this case we lose the custom fault messages. The other solution is to create a sub-orchestration and ‘start’ instances of this orchestration from main orchestration for each service endpoint. This looks ugly to me as hundreds of orchestration instances can bog down system performance.
Any thoughts on that?
Interesting scenario. So you have an orchestration doing the fan out. What would happen if you send the message to the MessageBox from the orchestration and then have a bunch of subscribing endpoints? Then you don’t have serial processing. You could swallow the response messages in a pipeline or spin up an orchestration if an exception occurs.
i am doing the simular thing like joon. I am debatching message in about 60,000 records
and update single records through MS Dynamics CRM 4.0 Adaptor.
I will try your pipe message debatch solution. If it works, i will give you an update.
Through your guidance I can put together a typed polling application that consumes SQL result sets pretty quickly. I’m facing a situation now that has me wondering how to solve it, and so far I have not had success. A stored procedure returns the emp1 records in one result set, and the sales records in a second result set. The generated schema has a
TypePolling node with two children (TypedPollingResultSet0 and TypePollingResultSet1). I can only set one body xpath for the TypedPollingNode and the result of that is that even though the incoming XML contains both result sets, only the one with the body xpath specified gets their messages posted to BizTalk.
Can I have both result sets debatched into BizTalk using typed polling, or should I have to write pipeline component to detatched?