Richard Seroter's Architecture Musings

Category: BizTalk

Here’s where you’ll find me over the Summer

I mean, you’ll mainly find me in Seattle, where I actually live. But, I’m also speaking on a variety of topics at a few shows over the next few months, and thought I’d point those out.

“Architecting Highly Available Cloud Integrations” at Integrate 2018 (June 4-6, London)

If the thing that connects your other things together isn’t resilient, you’re in trouble. In this talk, I’ll take a look at some core availability patterns for application/data integration, and then review how to configure Azure’s integration services for high availability. This is always a terrific show with compelling speakers, and the hosts at BizTalk360 always do a bang-up job putting it on.

“Product Ownership, Explained” at Agile 2018 (August 6-10, San Diego)

I know a few things about product ownership, such as how to be a good product owner, and a bad one. Primarily because I’ve been both. In this talk at the “big Agile” show, I’ll look at the role and what a product owner should do. I’m starting to work on this presentation now, and will likely list 10+ things that you should do, and a few things to avoid. This will be my second time speaking at this conference, and I enjoy hearing from so many folks focused on software teams and getting code to production.

“What NASA’s Voyager Mission Teaches Us About Building Distributed Systems” at Pluralsight LIVE (August 28-30, Salt Lake City)

I’ve been geeking out on space exploration books and movies lately, and thought it’d be fun to translate the lessons from an iconic NASA mission to the everyday challenges faced by software engineers. Here, I’ll show how some of the key ideas applied by NASA engineers reinforce some of the best practices when designing complex software systems. I didn’t attend the inaugural edition of this conference last year, but I’m jazzed to be part of it this time around.

I hope I’ll see you at some of these! If you’ll be at any one of them (or all three, if you’re my stalker!), do let me know.

June 1, 2018
Surprisingly simple messaging with Spring Cloud Stream
You’ve got a lot of options when connecting microservices together. You could use service discovery and make direct calls. Or you might use a shared database to transfer work. But message brokers continue to be a popular choice. These range from single purpose engines like Amazon SQS or RabbitMQ, to event stream processors like Azure Event Hubs or Apache Kafka, all the way to sophisticated service buses like Microsoft BizTalk Server. When developers choose any one of those, they need critical knowledge to be use them effectively. How can you shrink the time to value and help developers be productive, faster? For Java developers, Spring Cloud Stream offers a valuable abstraction.

Spring Cloud Stream offers an interface for developers that requires no knowledge of the underlying broker. That broker, either Apache Kafka or RabbitMQ, gets configured by Spring Cloud Stream. Communication to and from the broker is also done via the Stream library.

What’s exciting to me is that all brokers are treated the same. Spring Cloud Stream normalizes behavior, even if it’s not native to the broker. For example, want a competing consumer model for your clients, or partitioned processing? Those concepts behave differently in RabbitMQ and Kafka. No problem. Spring Cloud Stream makes it work the same, transparently. Let’s actually try both of those scenarios.

Competing consumers through “consumer groups”

By default, Spring Cloud Stream sets up everything as a publish-subscribe relationship. This makes it easy to share data among many different subscribers. But what if you want multiple instances of one subscriber (for scale out processing)? One solution is consumer groups. These don’t behave the same in both messaging brokers. Spring Cloud Stream don’t care! Let’s build an example app using RabbitMQ.

Before writing code, we need an instance of RabbitMQ running. The most dead-simple option? A Docker container for it. If you’ve got Docker installed, the only thing you need to do is run the following command:

-docker run -d –hostname local-rabbit –name demo-rmq -p 15672:15672 -p 5672:5672 rabbitmq:3.6.11-management

After running that, I have a local cache of the image, and a running container with port mapping that makes the container accessible from my host.

How do we get messages into RabbitMQ? Spring Cloud Stream supports a handful of patterns. We could publish on a schedule, or on-demand. Here, let’s build a web app that publishes to the bus when the user issues a POST command to a REST endpoint.

Publisher app

First, build a Spring Boot application that leverages spring-cloud-starter-stream-rabbit (and spring-boot-starter-web). This brings in everything I need to use Spring Cloud Stream, and RabbitMQ as a destination.

Add a new class that acts as our REST controller. A simple @EnableBinding annotation lights this app up as a Spring Cloud Stream project. Here, I’m using the built-in “Source” interface that defines a single communication channel, but you can also build your own.
```
@EnableBinding(Source.class)
@RestController
public class BriefController {
```
In this controller class, add an @Autowired variable that references the bean that Spring Cloud Stream adds for the Source interface. We can then use this variable to directly publish to the bound channel! Same code whether talking to RabbitMQ or Kafka. Simple stuff.
```
@EnableBinding(Source.class)
@RestController
public class BriefController {
  //refer to instance of bean that Stream adds to container
  @Autowired
  Source mysource;

  //take in a message via HTTP, publish to broker
  @RequestMapping(path="/brief", method=RequestMethod.POST)
  public String publishMessage(@RequestBody String payload) {

    System.out.println(payload);

    //send message to channel
    mysource.output().send(MessageBuilder.withPayload(payload).build());

    return "success";
  }
```
Our publisher app is done, so all that’s left is some basic configuration. This configuration tells Spring Cloud Stream how to connect to the right broker. Note that we don’t have to tell Spring Cloud Stream to use RabbitMQ; it happens automatically by having that dependency in our classpath. No, all we need is connection info to our broker, an explicit reference to a destination (without it, the RabbitMQ exchange would be called “output”), and a command to send JSON.
```
server.port=8080

#rabbitmq settings for Spring Cloud Stream to use
spring.rabbitmq.host=127.0.0.1
spring.rabbitmq.port=5672
spring.rabbitmq.username=guest
spring.rabbitmq.password=guest

spring.cloud.stream.bindings.output.destination=legalbriefs
spring.cloud.stream.default.contentType=application/json
```
Consumer app

This part’s almost too easy. Here, build a new Spring Boot application and only choose the spring-cloud-starter-stream-rabbit dependency.

In the default class, decorate it with @EnableBinding and use the built-in Sink interface. Then, all that’s left is to create a method to process any messages found in the broker. To do that, we decorate the operation with @StreamListener, and all the content type handling is done for us. Wicked.
```
@EnableBinding(Sink.class)
@SpringBootApplication
public class BlogStreamSubscriberDemoApplication {

  public static void main(String[] args) {
    SpringApplication.run(BlogStreamSubscriberDemoApplication.class, args);
  }

  @StreamListener(target=Sink.INPUT)
  public void logfast(String msg) {
    System.out.println(msg);
  }
}
```
The configuration for this app is straightforward. Like above, we have connection details for RabbitMQ. Also, note that the binding now references “input”, which was the name of the channel in the default “Sink” interface. Finally, observe that I used the SAME destination as the source, to ensure that Spring Cloud Stream wires up my publisher and subscriber successfully. For kicks, I didn’t yet add the consumer group settings.
```
server.port=0

#rabbitmq settings for Spring Cloud Stream to use
spring.rabbitmq.host=127.0.0.1
spring.rabbitmq.port=5672
spring.rabbitmq.username=guest
spring.rabbitmq.password=guest

spring.cloud.stream.bindings.input.destination=legalbriefs
```
Test the solution

Let’s see how this works. First, start up three instances of the subscriber app. I generated a jar file, and started up three instances in the shell.

When you start up these apps, Spring Cloud Stream goes to work. Log into the RabbitMQ admin console, and notice that one exchange got generated. This one, named “legalbriefs”, maps to the name we put in our configuration file.

We also have three queues that map to each of the three app instances we started up.

Nice! Finally, start up the publisher, and post a message to the /briefs endpoint.

What happens? As expected, each subscriber gets a copy of the message, because by default, everything happens in a pub/sub fashion.

Add consumer group configuration

We don’t want each instance to get a copy. Rather, we want these instances to share the processing load. Only one should get each message. In the subscriber app, we add a single line to our configuration file. This tells Spring Cloud Stream that all the instances form a single consumer group that share work.
```
#adds consumer group processing
spring.cloud.stream.bindings.input.group=briefProcessingGroup
```
After regenerating the subscriber jar file, and starting up each file, we see a different setup in RabbitMQ. What you see is a single, named queue, but three “consumers” of the queue.

Send in two different messages, and see that each is only processed by a single subscriber instance. This is a simple way to use a message broker to scale out processing.

Doing stateful processing using partitioning

Partitioning feels like a related, but different scenario than consumer groups. Partitions in Kafka introduce a level of parallel processing by writing data to different partitions. Then, each subscriber pulls from a given partition to do work. Here in Spring Cloud Stream, partitioning is useful for parallel processing, but also for stateful processing. When setting it up, you specify a characteristic that steers messages to a given partition. Then, a single app instance processes all the data in that partition. This can be handy for event processing or any scenario where it’s useful for related messages to get processed by the same instance. Think counters, complex event processing, or time-sensitive calculations.

Unlike with consumer groups. partitioning requires configuration changes to both publishers AND subscribers. On the publisher side, all that’s needed is (a) the number of partitions, and (b) the expression that describes how data is partitioned. That’s it. No code changes.
```
#adding configuration for partition processing
spring.cloud.stream.bindings.output.producer.partitionKeyExpression=payload.attorney
spring.cloud.stream.bindings.output.producer.partitionCount=3
```
On the subscriber side, you set the number of partitions, and set the “partitioned” property equal to “true.” What’s also interesting, but logical, is that as each subscriber starts, you need to give it an “index” so that Spring Cloud Streams knows which partition it should read from.
```
#add partition processing
spring.cloud.stream.bindings.input.consumer.partitioned=true
#spring.cloud.stream.instanceIndex=0
spring.cloud.stream.instanceCount=3
```
Let’s start everything up again. The publisher starts up the same as before. Now though, each subscriber instance starts up with a “”spring.cloud.stream.instanceIndex=X” flag that specifies which index applies.

In RabbitMQ, the setup is different than before. Now, we have three queues, each with a different “routing key” that corresponds to its partition.

Send in a message, and notice that all messages with the same attorney name go to one instance. Change the case number, and see that all messages still go to the same place. Switch the attorney ID, and observe that a different partition (likely) gets it. If you have more data varieties than you do partitions, you’ll see a partition handle more than one set of data. No problem, just know that happens.

Summary

It shouldn’t have to be hard to deal with message brokers. Of course there are plenty of scenarios where you want to flex the advanced options of a broker, but there are also many cases where you just want a reliable intermediary. In those cases, Spring Cloud Stream makes it super easy to abstract away knowledge of the broker, while still normalizing behavior across the unique engines.

In my latest Pluralsight course, I spent over an hour digging into Spring Cloud Stream, and another ninety minutes working with Spring Cloud Data Flow. That project helps you quickly string together Stream applications. Check it out for a deeper dive!
September 5, 2017
Introducing cloud-native integration (and why you should care!)

I’ve got three kids now. Trying to get anywhere on time involves heroics. My son is almost ten years old and he’s rarely the problem. The bottleneck is elsewhere. It doesn’t matter how much faster my son gets himself ready, it won’t improve my family’s overall speed at getting out the door. The Theory of Constraints says that you improve the throughput of your process by finding and managing the bottleneck, or constraint. Optimizing areas outside the constraint (e.g. my son getting ready even faster) don’t make much of a difference. Does this relate to software, and application integration specifically? You betcha.

Software delivery goes through a pipeline. Getting from “idea” to “production” requires a series of steps. And then you repeat it over and over for each software update. How fast you get through that process dictates how responsive you can be to customers and business changes. Your development team may operate LIKE A MACHINE and crank out epic amounts of code. But if your dedicated ops team takes forever to deploy it, then it just doesn’t matter how fast your devs are. Inventory builds up, value is lost. My assertion is that the app integration stage of the pipeline is becoming a bottleneck. And without making changes to how you do integration, your cloud-native efforts are going to waste.

What’s “cloud native” all about? At this month’s Integrate conference, I had the pleasure of talking about it. Cloud-native refers to how software is delivered, not where. Cloud-native systems are built for scale, built for continuous change, and built to tolerate failure. Traditional enterprises can become cloud natives, but only if they make serious adjustments to how they deliver software.

#Integrate2017 @rseroter great overview on what cloud-native is, but also a great overview of how modern software dev should work today pic.twitter.com/MY8X59cRey

— Daniel Probert (@probertdaniel) June 28, 2017

Even if you’ve adjusted how you deliver code, I’d suspect that your data, security, and integration practices haven’t caught up. In my talk, I explained six characteristics of a cloud-native integration environment, and mixed in a few demos (highlighted below) to prove my points.

#1 – Cloud-native integration is more composable

By composable, I mean capable of assembling components into something greater. Contrast this to classic integration solutions where all the logic gets embedded into a single artifact. Think ETL workflows where ever step of the process is in one deployable piece. Need to change one component? Redeploy the whole thing. One step require a ton of CPU processing? Find a monster box to host the process in.

A cloud-native integration gets built by assembling independent components. Upgrade and scale each piece independently. To demonstrate this, I built a series of Microsoft Logic Apps. Two of them take in data. The first takes in a batch file from Microsoft OneDrive, the other takes in real-time HTTP requests. Both drop the results to a queue for later processing.

The “main” Logic App takes in order entries, enriches the order via a REST service I have running in Azure App Service, calls an Azure Function to assign a fraud score, and finally dumps the results to a queue for others to grab.

My REST API sitting in Azure App Service is connected to a GitHub repo. This means that I should be able to upgrade that individual service, without touching the data pipeline sitting Logic Apps. So that’s what I did. I sent in a steady stream of requests, modified my API code, pushed the change to GitHub, and within a few seconds, the Logic App is emitting out a slightly different payload.

#2 – Cloud-native integration is more “always on”

One of the best things about early cloud platforms being less than 100% reliable was that it forced us to build for failure. Instead of assuming the infrastructure was magically infallible, we built systems that ASSUMED failure, and architected accordingly.

For integration solutions, have we really done the same? Can we tolerate hardware failure, perform software upgrades, or absorb downstream dependency hiccups without stumbling? A cloud-native integration solution can handle a steady load of traffic while staying online under all circumstances.

#3 – Cloud-native integration is built for scale

Elasticity is a key attribute of cloud. Don’t build out infrastructure for peak usage; build for easy scale when demand dictates. I haven’t seen too many ESB or ETL solutions that transparently scale, on-demand with no special considerations. No, in most cases, scaling is a carefully designed part of an integration platform’s lifecycle. It shouldn’t be.

If you want cloud-native integration, you’ll look to solutions that support rapid scale (in, or out), and let you scale individual pieces. Event ingestions unexpected and overwhelming? Scale that, and that alone. You’ll also want to avoid too much shared capacity, as that creates unexpected coupling and makes scaling the environment more difficult.

#4 – Cloud-native integration is more self-service

The future is clear: there will be more “citizen integrators” who don’t need specialized training to connect stuff. IFTTT is popular, as are a whole new set of iPaaS products that make it simple to connect apps. Sure, they aren’t crazy sophisticated integrations; there will always be a need for specialists there. But integration matters more than ever, and we need to democratize the ability to connect our stuff.

One example I gave here was Pivotal Cloud Cache and Pivotal GemFire. Pivotal GemFire is an industry-leading in-memory data grid. Awesome tech, but not trivial to properly setup and use. So, Pivotal created an opinionated slice of GemFire with a subset of features, but an easier on-ramp. Pivotal Cloud Cache supports specific use cases, and an easy self-service provisioning experience. My challenge to the Integrate conference audience? Why couldn’t we create a simple facade for something powerful, but intimidating, like Microsoft BizTalk Server? What if you wanted a self-service way to let devs create simple integrations? I decided use the brand new Management REST API from BizTalk Server 2016 Feature Pack 1 to build one.

.@rseroter building a BizTalk interface through management API using Java #integrate2017 #BizTalkPipes pic.twitter.com/GDNWofsdDl

— Kent Weare (@wearsy) June 28, 2017

I used the incomparable Spring Boot to build a Java app that consumed those REST APIs. This app makes it simple to create a “pipe” that uses BizTalk’s durable bus to link endpoints.

I built a bunch of Java classes to represent BizTalk objects, and then created the required API payloads.

The result? Devs can create a new pipe that takes in data via HTTP and drops the result to two file locations.

When I click the button above, I use those REST APIs to create a new in-process HTTP receive location, two send ports, and the appropriate subscriptions.

Fun stuff. This seems like one way you could unlock new value in your ESB, while giving it a more cloud-native UX.

#5 – Cloud-native integration supports more endpoints

There’s no turning back. Your hippest integration offered to enterprise devs CANNOT be SharePoint. Nope. Your teams want to creatively connect to Slack, PagerDuty, Salesforce, Workday, Jira, and yes, enterprisey things like SQL Server and IBM DB2.

These endpoints may be punishing your integration platform with a constant data stream, or, process data irregularly, in bulk. Doing newish patterns like Event Sourcing? Your apps will talk to an integration platform that offers a distributed commit log. Are you ready? Be ready for new endpoints, with new data streams, consumed via new patterns.

#6 – Cloud-native integration demands complete automation

Are you lovingly creating hand-crafted production servers? Stop that. And devs should have complete replicas of production environments, on their desktop. That means packaging and automating the integration bus too. Cloud-natives love automation!

Testing and deploying integration apps must be automated. Without automated tests, you’ll never achieve continuous delivery of your whole system. Additionally, if you have to log into one of your integration servers, you’re doing it wrong. All management (e.g. monitoring, deployments, upgrades) should be done via remote tools and scripts. Think fleets of servers, not long-lived named instances.

To demonstrate this concept, I discussed automating the lifecycle of your integration dependency. Specifically, through the use of a service broker. Initially part of Cloud Foundry, the service broker API has caught on elsewhere. A broad set of companies are now rallying around a single API for advertising services, provisioning, de-provisioning, and more. Microsoft built a Cloud Foundry service broker, and it handles lots of good things. It handles lifecycle and credential sharing for services like Azure SQL Database, Azure Service Bus, Azure CosmosDB, and more. I installed this broker into my Pivotal Web Services account, and it advertised available services.

Simply by typing in cf create-service azure-servicebus standard integratesb -c service-bus-config.json I kicked off a fast, automated process to generate an Azure Resource Group and create a Service Bus namespace.

Then, my app automatically gets access to environment variables that hold the credentials. No more embedding creds in code or config, no need to go to the Azure Portal. This makes integration easy, developer-friendly, and repeatable.

Summary

It’s such an exciting time to be a software developer. We’re solving new problems in new ways, and making life better for so many. The last thing we want is to be held back by a bottleneck in our process. Don’t let integration slow down your ambitions. The technology is there to help you build integration platforms that are more scalable, resilient, and friendly-to-change. Go for it!

July 10, 2017
Using speaking opportunities as learning opportunities

Over this summer, I’ll be speaking at a handful of events. I sign myself up for these opportunities —in addition to teaching courses for Pluralsight —so that I commit time to learning new things. Nothing like a deadline to provide motivation!

Do you find yourself complaining that you have a stale skill set, or your “brand” is unknown outside your company? You can fix that. Sign up for a local user group presentation. Create a short “course” on a new technology and deliver it to colleagues at lunch. Start a blog and share your musings and tech exploration. Pitch a talk to a few big conferences. Whatever you do, don’t wait for others to carve out time for you to uplevel your skills! For me, I’m using this summer to refresh a few of my own skill areas.

In June, I’m once again speaking in London at Integrate. Application integration is arguably the most important/interesting part of Azure right now. Given Microsoft’s resurgence in this topic area, the conference matters more than ever. My particular session focuses on “cloud-native integration.” What is it all about? How do you do it? What are examples of it in action? I’ve spent a fair amount of time preparing for this, so hopefully it’s a fun talk. The conference is nearly sold out, but I know there are handful of tickets left. It’s one of my favorite events every year.

Coming up in July, I’m signed up to speak at PerfGuild. It’s a first-time, online-only conference 100% focused on performance testing. My talk is all about distributed tracing and using it to uncover (and resolve) latency issues. The talk builds on a topic I covered in my Pluralsight course on Spring Cloud, with some extra coverage for .NET and other languages. As of this moment, you can add yourself to the conference waitlist.

Finally, this August I’ll be hitting balmy Orlando, FL to speak at the Agile Alliance conference. This year’s “big Agile” conference has a track centered on foundational concepts. It introduces attendees to concepts like agile project delivery, product ownership, continuous delivery, and more. My talk, DevOps Explained, builds on things I’ve covered in recent Pluralsight courses, as well as new research.

Speaking at conferences isn’t something you do to get wealthy. In fact, it’s somewhat expensive. But in exchange for incurring that cost, I get to allocate time for learning interesting things. I then take those things, and share them with others. The result? I feel like I’m investing in myself, and I get to hang out at conferences with smart people.

If you’re just starting to get out there, use a blog or user groups to get your voice heard. Get to know people on the speaking circuit, and they can often help you get into the big shows! If we connect at any of the shows above, I’m happy to help you however I can.

May 15, 2017
How should you model your event-driven processes?
During most workdays, I exist in a state of continuous partial attention. I bounce between (planned and unplanned) activities, and accept that I’m often interrupt-driven. While that’s not an ideal state for humans, it’s a great state for our technology systems. Event-driven applications act based on all sorts of triggers: time itself, user-driven actions, system state changes, and much more. Often, these batch-or-realtime, event-driven activities are asynchronous and coordinated in some way. What options do you have for modeling event-driven processes, and what trade-offs do you make with each option?

Option #1 – Single, Deterministic Process

In this scenario, the event handler is monolithic in nature, and any embedded components are purpose-built for the process at hand. Arguably, it’s just a visually modeled code class. While initiated via events, the transition between internal components is pre-determined. The process is typically deployed and updated as a single unit.

What would you use to build it?

You’ve seen (and built?) these before. A traditional ETL job fits the bill. Made up of components for each stage, it’s a single process executed as a linear flow. I’d also categorize some ESB workflows—like BizTalk orchestrations—in this category. Specifically, those with send/receive ports bound to the specific orchestration, embedded code, or external components built JUST to support that orchestration’s flow.

In any of these cases, it’s hard (or impossible) to change part of the event handler without re-deploying the entire process.

What are the benefits?

Like with most monolithic things, there’s value in (perceived) simplicity and clarity. A few other benefits:
- Clearer sense of what’s going on. When there’s a single artifact that explains how you handle a given event, it’s fairly simple to grok the flow. What happens when sensor data comes in? Here you go. It’s predictable.
- Easier to zero-in on production issues. Did a step in the bulk data sync job fail? Look at the flow and see what bombed out. Clean up any side effects, and re-run. This doesn’t necessarily mean things are easy to fix—frankly, it could be harder—but you do know where things went wrong.
- Changing and testing everything at once. If you’re worried about the side effects of changing a piece of a flow, that risk may lessen when you’re forced to test the whole thing when one tiny thing changes. One asset to version, one asset to track changes for.
- Accommodates companies with teams of specialists. From what I can tell, many large companies still have centers-of-excellence for integration pros. That means most ETL and ESB workflows come out of here. If you like that org structure, then you’ll prefer more integrated event handlers.
What are the risks?

These processes are complicated, not complex. Other risks:
- Cumbersome to change individual parts of the process. Nowadays, our industry prioritizes quick feedback loops and rapid adjustments to software. That’s difficult to do with monolithic event-driven processes. Is one piece broken? Better prioritize, upgrade, compile, test, and deploy the whole thing!
- Non-trivial to extend the process to include more steps. When we think of event-driven activities, we often think of fairly dynamic behavior. But when all the event responses are tightly coordinated, it’s tough to add/adjust/remove steps.
- Typically centralizes work within a single team. While your org may like siloed teams of experts, that mode of working doesn’t lend itself to agility or customer focus. If you build a monolithic event-driven process, expect delivery delays as the work queues up behind constrained developers.
- Process scales as one single unit. Each stage of an event-driven workflow will have its own resource demands. Some will be CPU intensive. Others produce heavy disk I/O. If you have a single ETL or ESB workflow to handle events, expect to scale that entire thing when any one component gets constrained. That’s pretty inefficient and often leads to over-provisioning.
Option #2 – Orchestrated Components

In this scenario, you’ve got a fairly loose wrapper around independent services that respond to events. These are individual components, built and delivered on their own. While still somewhat deterministic—you are still modeling a flow—the events aren’t trapped within that flow.

What would you use to build it?

Without a doubt, you can still use traditional ESB tools to construct this model. A BizTalk orchestration that listens to events and calls out to standalone services? That works. Most iPaaS products also fit the bill here. If you build something with Azure Logic Apps, you’re likely going to be orchestrating a set of services in response to an event. Those services could be REST-based APIs backed by API Management, Azure Functions, or a Service Bus queue that may trigger a whole other event-driven process!

You could also use tools like Spring Cloud Data Flow to build orchestrated, event-driven processes. Here, you chain together standalone Spring Boot apps atop a messaging backbone. The services are independent, but with a wrapper that defines a flow.

Fun to see what people try to do with @springcloud Data Flow. Here, Twitter sentiment analysis using TensorFlow … https://t.co/nY7JApYMGF pic.twitter.com/keo4AIye03

— Richard Seroter (@rseroter) April 28, 2017

What are the benefits?

The main benefits of this model stem from the decoupling and velocity that comes with it. Others are:
- Distributed development. While you still have someone stitching the process together, develop the components independently. And hopefully, you get more people in the mix who don’t even need to know the “wrapper” technology. Or in the case of Spring Cloud Data Flow or Logic Apps, the wrapper technology is dev-oriented and easier to understand than traditional integration systems. Either way, this means more parallel development and faster turnaround of the entire workflow.
- Composable processes. Configure or reconfigure event handlers based on what’s needed. Reuse each step of the event-driven process (e.g. source channel, generic transformation component) in other processes.
- Loose grip on the event itself. There could be many parties interested in a given event. Your flow may be just one. While you could reuse the inbound channel to spawn each event-driven processes, you can also wiretap orchestrated processes.
What are the risks?

You’ve got some risks with a more complex event-driven flow. Those include:
- Complexity and complicated-ness. Depending on how you build this, you might not only be complicated but also complex! Many moving parts, many distributed components. This might result in trickier troubleshooting and less certainty about how the system behaves.
- Hidden dependencies. While the goal may be to loosely orchestrate services in an event-driven flow, it’s easy to have leaky abstractions. “Independent” services may share knowledge between each other, or depend on specific underlying infrastructure. This means that you need good documentation, and services that don’t assume that dependencies exist.
- Breaking changes and backwards compatibility. Any time you have a looser federation of coordinated pieces, you increase the likelihood of one bad actor causing cascading problems. If you have a bunch of teams that build/run services on their own, and one team combines them into an event-driven workflow, it’s possible to end up with unpredictable behavior. Mitigation options? Strong continuous integration practices to catch breaking changes, and a runtime environment that catches and isolates errors to minimize impact.
Option #3 – Choreographed Components

In this scenario, your event-driven processes are extremely fluid. Instead of anything dictating the flow, services collaborate by publishing and subscribing to messages. It’s fully decentralized. Any given services has no idea who or what is upstream or downstream of it. They do their job, and any service that wants to do subsequent work, great.

Choreography, not orchestration. @fgeorge52 talks about event-driven IoT fun for with colored lights in his house. #AATC2017 pic.twitter.com/TymueNpSuo

— Richard Seroter (@rseroter) April 21, 2017

What would you use to build it?

In these cases, you’re often working with low-level code, not high level abstractions. Makes sense. But there are frameworks out there that make it easier for you if you don’t crave writing to or reading from queues. For .NET developers, you have things like MassTransit or nServiceBus. Those provide helpful abstractions. If you’re a Java developer, you’ve got something like Spring Cloud Stream. I’ve really fallen in love with it. Stream provides an elegant abstraction atop RabbitMQ or Apache Kafka where the developer doesn’t have to know much of anything about the messaging subsystem.

What are the benefits?

Some of the biggest benefits come from the velocity and creativity that stem from non-deterministic event processing.
- Encourages adaptable processes. With choreographed event processors, making changes is simple. Deploy another service, and have it listen for a particular event type.
- Makes everyone an integration developer. Done right, this model lessens the need for a siloed team of experts. Instead, everyone builds apps that care about events. There’s not much explicit “integration” work.
- Reflects changing business dynamics. Speed wins. But not speed for the sake of it. But speed of learning from customers and incorporating feedback into experiments. Scrutinize anything that adds friction to your learning process. Fixed workflows owned by a single team? Increasingly, that’s an anti-pattern for today’s software-driven, event-powered businesses. You want to be able to handle the influx of new data and events and quickly turn that into value.
What are the risks?

Clearly, there are risks to this sort of “controlled chaos” model of event processing. These include:
- Loss of cross-step coordination. There’s value in event-driven workflows that manage state between stages, compensate for failed operations, and sequence key steps. Now, there’s nothing that says you can’t have some processes that depend on orchestration, and some on choreography. Don’t adopt an either-or mentality here!
- Traceability is hairy. When an event can travel any number of possible paths, and those paths are subject to change on a regular basis, auditability can’t be taken for granted! If it takes a long time for an inbound event to reach a particular destination, you’ve got some forensics to do. What part was slow? Did something get dropped? How come this particular step didn’t get triggered? These aren’t impossible challenges, but you’ll want to invest in solid logging and correlation tools.
You’ve got lots of options for modeling event-driven processes. In reality, you’ll probably use a mix of all three options above. And that’s fine! There’s a use case for each. But increasingly, favor options #2 and #3 to give you the flexibility you need.

Did I miss any options? Are there benefits or risks I didn’t list? Tell me in the comments!
May 2, 2017
Creating a JSON-Friendly Azure Logic App That Interacts with Functions, DocumentDB and Service Bus

I like what Microsoft’s doing in the app integration space. They breathed new life into their classic integration bus (BizTalk Server). The family of Azure Service Bus technologies (Queues, Topics, Relay) is super solid. API Management and Event Hubs solve real needs. And Azure Logic Apps is maturing at an impressive rate. That last one is the one I wanted to dig into more. Logic Apps gets updated every few weeks, and I thought it’d be fun to put a bunch of new functionality to the test. Specifically, I’m going to check out the updated JSON support, and invoke a bunch of Azure services.

Step 1 – Create Azure DocumentDB collection

In my fictitious example, I’m processing product orders. The Logic App takes in the order, and persists it in a database. In the Azure Portal, I created a database account.

DocumentDB stores content in “collections”, so I needed one of those. To define a collection you must provide some names, throughput (read/write) capacity, and a partition key. The partition key is used to shard the data, and document IDs have to be unique within that partition.

Ok, I was all set to store my orders.

Step 2 – Create Azure Function

Right now, you can’t add custom code inside a Logic App. Microsoft recommends that you call out to an Azure Function if you want to do any funny business. In this example, I wanted to generate a unique ID per order. So, I needed a snippet of code that generated a GUID.

First up, I created a new Azure Functions app.

Next up, I had to create an actual function. I could start from scratch, or use a template. I chose the “generic webhook” template for C#.

This function is basic. All I do is generate a GUID, and return it back.

Step 3 – Create Service Bus Queue

When a big order came in, I wanted to route a message to a queue for further processing. Up front, I created a new Service Bus queue to hold these messages.

With my namespace created, I added a new queue named “largeorders.”

That was the final prerequisite for this demo. Next up, building the Logic App!

Step 4 – Create the Azure Logic App

First, I defined a new Logic App in the Azure Portal.

Here’s the first new thing I saw: an updated “getting started” view. I could choose a “trigger” to start off my Logic App, or, choose from a base scenario template.

I chose the trigger “when an HTTP request is received” and got an initial shape on my Logic App. Now, here’s where I saw the second cool update: instead of manually building a JSON schema, I could paste in a sample and generate one. Rad.

Step 5 – Call out to Azure Functions from Logic App

After I received a message, I wanted to add it to DocumentDB. But first, I need my unique order ID. Recall that our Azure Function generated one. I chose to “add an action” and selected “Azure Functions” from the list. As you can see below, once I chose that action, I could browse the Function I already created. Note that a new feature of Logic Apps allows you to build (Node.js) Functions from within the Logic App designer itself. I wanted a C# Function, so that’s why I did it outside this UI.

Step 6 – Insert record into DocumentDB from Logic App

Next up, I picked the “DocumentDB” activity, and chose the “create or update document” action.

Unfortunately, Logic Apps doesn’t (yet) look up connection strings for me. I opened another browser tab and navigated back to the DocumentDB “blade” to get my account name and authorization key. Once I did that, the Logic Apps Designer interrogated my account and let me pick my database and collection. After that, I built the payload to store the database. Notice that I built up a JSON message using values from the inbound HTTP message, and Azure Function. I also set the partition key to the “category” value from the inbound message.

What I have above won’t work. Why? In the present format, the “id” value is invalid. It would contain the whole JSON result from the Azure Function. There’s no way (yet) to grab a part of the JSON in the Designer, but there is a way in code. After switching to “code view”, I added [‘orderid’] reference to the right spot …

When I switched back to the Designer view, I saw “orderid” the mapped value.

That finished the first part of the flow. In the second part, I wanted to do different things based on the “category” of the purchased product.

Step 7 – Add conditional flows to Logic App

Microsoft recently added a “switch” statement condition to the palette, so I chose that. After choosing the data field to “switch” on, I added a pair of paths for different categories of product.

Inside the “electronics” switch path, I wanted to check and see if this was a big order. If so, I’d drop a message to a Service Bus queue. At the moment, Logic Apps doesn’t let me create variables (coming soon!), so I needed another way to generate the total order amount. Azure Functions to the rescue! From within the Logic Apps Designer, I once again chose the Azure Functions activity, but this time, selected “Create New Function.” Here, I passed in the full body of the initial message.

Inside the Function, I wrote some code that multiplied the quantity by the unit price.

We’re nearly done! After this Function, I added an if/else conditional that checked the Function’s result, and if it’s over 100, I send a message to the Azure Service Bus.

Step 8 – Send a response back to the Logic App caller

Whew. Last step to do? Send an HTTP response back to the caller, containing the auto-generated order ID. Ok, my entire flow was finished. It took in a message, added it to DocumentDB, and based on a set of conditions, also shipped it over the Azure Service Bus.

Step 9 – Test this thing!

I grabbed the URL for the Logic App from the topmost shape, and popped it into Postman. After sending in the JSON payload, I got back a GUID representing the generated order ID.

That’s great and all, but I needed to confirm everything worked! DocumentDB with a Function-generated ID? Check.

Service Bus message viewable via the Service Bus Explorer? Check.

The Logic Apps overview page on the Azure Portal also shows a “run history” and lets you inspect the success/failure of each step. This is new, and very useful.

Summary

All in all, this was pretty straightfoward. The Azure Portal still has some UI quirks, but a decent Azure dev can crank out the above flow in 20 minutes. That’s pretty powerful. Keep an eye on Logic Apps, and consider taking it for a spin!

February 21, 2017
2016 in Review: Reading and Writing Highlights

2016 was a wild year for plenty of folks. Me too, I guess. The wildest part was joining Pivotal and signing up for a job I’d never done before. I kept busy in other ways in 2016, including teaching a couple of courses for Pluralsight, traveling around to speak at conferences, writing a bunch for InfoQ.com, and blogging here with semi-regularity. 2017 should be more of the same (minus a job change!), plus another kiddo on the way.

I tend to read a lot, and write a bit, so each year I like to reflect on my favorites.

Favorite Blog Posts and Articles I Wrote

I create stuff in a handful of locations—this blog, InfoQ.com, Pivotal blog—and here were the content pieces I liked the most.

[My Blog] Modern Open Source Messaging: Apache Kafka, RabbitMQ and NATS in Action. This was my most popular blog post this year, by far. Application integration and messaging are experience a renaissance in this age of cloud and microservices, and OSS software is leading the way. If you want to watch my conference presentation that sparked this blog post, head to the BizTalk360 site.

[My Blog] Trying out the “standard” and “enterprise” templates in Azure Logic Apps. Speaking of app integration, Microsoft turned a corner in 2016 and has its first clear direction in years. Logic Apps is a big part of that future, and I gave the new stuff a spin. FYI, since I wrote the original post, the Enterprise Integration Pack shipped with a slightly changed user experience.

[My Blog] Characteristics of great managers. I often looked at “management” as a necessary evil, but a good manager actually makes a big difference. Upon reflection, I listed some of the characteristics of my best managers.

[My Blog] Using Concourse to continuously deliver a Service Bus-powered Java app to Pivotal Cloud Foundry on Azure. 15 years. That’s how long it had been since I touched Java. When I joined Pivotal, the company behind the defacto Java framework called Spring, I committed to re-learning it. Blog posts like this, and my new Pluralsight course, demonstrated that I learned SOMETHING.

[InfoQ] Outside of my regular InfoQ contributions covering industry news, I ran a series on the topic of “cloud lock-in.” I wrote an article called “Everything is Lock-In: Focus on Switching Costs” and facilitated a rowdy expert roundtable.

[InfoQ] Wolfram Wants to Deliver “Computation Everywhere” with New Private Cloud. I purposely choose to write about things I’m not familiar with. How else am I supposed to learn? In this case, I dug into the Wolfram offerings a bit, and interviewed a delightful chap.

[Pivotal] Pivotal Conversations Podcast. You never know what may happen when you say “yes” to something. I agreed to be a guest on a podcast earlier this year, and as a result, my delightfully bearded work colleague Coté asked me to restart the Pivotal podcast with him. Every week we talk about the news, and some tech topic. It’s been one of my favorite things this year.

[Pivotal] Standing on the Shoulders of Giants: Supercharging Your Microservices with NetflixOSS and Spring Cloud. I volunteered to write a whitepaper about microservices scaffolding and Spring Cloud, and here’s the result. It was cool to see thousands of folks check it out.

[Pivotal blog] 250k Containers In Production: A Real Test For The Real World. Scale matters, and I enjoyed writing up the results of an impressive benchmark by the Cloud Foundry team. While I believe our industry is giving outsized attention to the topic of containers, the people who *should care* about them (i.e. platform builders) want tech they can trust at scale.

[Pivotal blog] To Avoid Getting Caught In The Developer Skills Gap, Do This. It’s hard to find good help these days. Apparently companies struggle to fill open developer positions, and I offered some advice for closing the skills gap.

Favorite Books I Read

I left my trusty Kindle 3 behind on an airplane this year, and replaced it with a new Kindle Paperwhite. Despite this hiccup, I still finished 31 books this year. Here are the best ones I read.

The Hike. I don’t read much fantasy-type stuff, but I love Drew’s writing and gave this a shot. Not disappointed. Funny, tense, and absurd tale that was one of my favorite books of the year. You’ll never look at crustaceans the same way again.

The Prey Series. I’m a sucker for mystery/thriller books and thought I’d dig into this long-running series. Ended up reading the first six of them this year. Compelling protagonist, downright freaky villains.

The Last Policeman Trilogy. I’m not sure where I saw the recommendation for these books, but I’m glad I did. Just fantastic. I plowed through Book 1, Book 2, and Book 3 in about 10 days. It starts as a “cop solving a mystery even though the world is about to end” and carries onward with a riveting sense of urgency.

Rubicon: The Last Years of the Roman Republic. I really enjoyed this. Extremely engaging story about a turning point in human history. It was tough keeping all the characters straight after a while, but I have a new appreciation for the time period and the (literally) cutthroat politics.

The Great Bridge: The Epic Story of the Building of the Brooklyn Bridge. It’s easy to glamorize significant construction projects, but this story does a masterful job showing you the glory *and* pain. I was inspired reading it, so much so that I wrote up a blog post comparing software engineering to bridge-building.

The Path Between the Seas: The Creation of the Panama Canal, 1870-1914. You’ve gotta invest some serious time to get through McCullough’s books, but I’ve never regretted it. This one is about the tortured history of building the Panama Canal. Just an unbelievable level of effort and loss of life to make it happen. It’s definitely a lesson on preparedness and perseverance.

The Summer of 1787: The Men Who Invented the Constitution. Instead of only ingesting hot-takes about American history and the Founder’s intent, it’s good to take time to actually read about it! I seem to read an American history book each year, and this one was solid. Good pacing, great details.

The Liberator: One World War II Soldier’s 500-Day Odyssey from the Beaches of Sicily to the Gates of Dachau. I also seem to read a WWII book every year, and this one really stayed with me. I don’t believe in luck, but it’s hard to attribute this man’s survival to much else. Story of hope, stress, disaster, and bravery.

Navigating Genesis: A Scientist’s Journey through Genesis 1–11. Intriguing investigation into the overlap between the biblical account and scientific research into the origins of the universe. Less conflict than you may think.

Boys Among Men: How the Prep-to-Pro Generation Redefined the NBA and Sparked a Basketball Revolution. I’m a hoops fan, but it’s easy to look at young basketball players as spoiled millionaires. That may be true, but it’s the result of a system that doesn’t set these athletes up for success. Sobering story that reveals how elusive that success really is.

Yes, My Accent Is Real: And Some Other Things I Haven’t Told You. This was such a charming set of autobiographical essays from The Big Bang Theory’s Kunal Nayyar. It’s an easy read, and one that provides a fun behind-the-scenes look at “making it” in Hollywood.

Eleven Rings: The Soul of Success. One of my former colleagues, Jim Newkirk, recommended this book from Phil Jackson. Jim said that Jackson’s philosophy influenced how he thinks about software teams. Part autobiography, part leadership guide, this book includes a lot of advice that’s applicable to managers in any profession.

Disrupted: My Misadventure in the Start-Up Bubble. I laughed, I cried, and then I panicked when I realized that I had just joined a startup myself. Fortunately, Pivotal bore no resemblance to the living caricature that is/was HubSpot. Read this book from Lyons before you jump ship from a meaningful company to a glossy startup.

Overcomplicated: Technology at the Limits of Comprehension. The thesis of this book is that we’re building systems that cannot be totally understood. The author then goes into depth explaining how to approach complex systems, how to explore them when things go wrong, and how to use caution when unleashing this complexity on customers.

Pre-Suasion: A Revolutionary Way to Influence and Persuade. If you were completely shocked by the result of the US presidential election, then you might want to read this book. This election was about persuasion, not policy. The “godfather of persuasion” talks about psychological framing and using privileged moments to impact a person’s choice. Great read for anyone in sales and marketing.

Impossible to Ignore: Creating Memorable Content to Influence Decisions: Creating Memorable Content to Influence Decisions. How can you influence people’s memories and have them act on what you think is important? That’s what this book attempts to answer. Lots of practical info grounded in research studies. If you’re trying to land a message in a noisy marketplace, you’ll like this book.

Win Your Case: How to Present, Persuade, and Prevail–Every Place, Every Time. I apparently read a lot about persuasion this year. This one is targeted at trial lawyers, but many of the same components of influence (e.g. trust, credibility) apply to other audiences.

The Challenger Customer: Selling to the Hidden Influencer Who Can Multiply Your Results. Thought-provoking stuff here. The author’s assertion is that the hard part of selling today isn’t about the supplier struggling to sell their product, but about the customer’s struggle to buy them. An average of 5.4 people are involved in purchasing decisions, and it’s about using “commercial insight” to help them create consensus early on.

The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations. This is the new companion book to the DevOps classic, The Phoenix Project. It contains tons of advice for those trying to change culture and instill a delivery mindset within the organization. It’s full of case studies from companies small and large. Highly recommended.

Start and Scaling DevOps in the Enterprise. Working at Pivotal, this is one of the questions we hear most from Global 2000 companies: how do I scale agile/DevOps practices to my whole organization? This short book tackles that question with some practical guidance and relevant examples.

A sincere thanks to all of you for reading my blog, watching my Pluralsight courses, and engaging me on Twitter in 2016. I am such a better technologist and human because of these interactions with so many interesting people!

December 27, 2016
Trying out the “standard” and “enterprise” templates in Azure Logic Apps

Is the Microsoft integration team “back”? It might be premature to say that Microsoft has finally figured out its app integration story, but the signs are very positive. There’s been a fresh influx of talent like Jon Fancey, Tord Glad Nordahl, and Jim Harrer, some welcome forethought into the overall Microsoft integration story, better community engagement, and a noticeable uptick in the amount of software released by these teams.

One area that’s been getting tons of focus in Azure Logic Apps. Logic Apps are a potential successor to classic on-premises application integration tools, but with a cloud-first bent. Users can visually model flows made up of built-in, or custom, activities. The initial integrations supported by Logic Apps were focused on cloud endpoints, but with the recent beta release of the Enterprise Integration Pack, Microsoft is making its move to more traditional use cases. I haven’t messed around with Logic Apps for a few months, and lots of things have changed, so I tested out both the standard and enterprise templates.

One nice thing about things like Logic Apps is that anyone can get started with just a browser. If you’re building a standard workflow (read: doesn’t require extra services or the “enterprise integration” bits), then you don’t have to install a single thing. To start with, I went the Azure Portal (the new one, not the classic one), and created a new “Logic App.”

I was then presented with a choice for how to populate the app itself. There’s the default “blank” template, or, I can start off with a few pre-canned options. Some of these are a bit contrived (“save my tweets to a SharePoint list” makes me sad), but they give you a good idea of what’s possible with the many built-in connectors.

I chose the HTTP Request-Response template since my goal was to build a simple synchronous web service. The portal showed me what this template does, and dropped me into the design canvas with the HTTP Request and HTTP Response activities in place.

I have a birthday coming and am feeling old, so I decided to build a simple service that would tell me if I was old or not. In order to easily use the fields of an inbound JSON message, I had to define a simple JSON schema inside the HTTP Request shape. This schema defines a string for the “name” and an integer for the “age.”

Before sending a response, I want to actually do something! So, I added an if-then condition to the canvas. There are other conditionals available, such as for-each and do-until. I put this if-then shape in between the Request and Response elements, and was able to choose the “age” value for my conditional check.

Here, I checked to see if “age” is greater than 40. Notice that I also had access to the “name” field, as well as the whole request body or HTTP headers. Next, I wanted to send a different HTTP response for over-40, and under-40. The brand new “compose” activity is the answer. With this, I could create a new message to send back in the HTTP response.

I simply typed a new JSON message into the Compose activity, using the variable for the “name”, and adding some text to categorize the requestor’s age.

I then did the same thing for the “no” path of the if-then and had a complete flow!

Quick and easy! The topmost HTTP Receive activity has the URL for this particular Logic App, and since I didn’t apply any security policies, it was super simple to invoke. From within my favorite API testing tool, Postman, I submitted a JSON message to the endpoint. Sure enough, I got back a response that corresponded to the provided age.

Great. But what about doing all the Enterprisey stuff? I built another new Logic App, and this time, wanted to send a comma separated payload to an HTTP endpoint and get back XML. There’s a Logic Apps template for that and when I selected it, I was told I needed an “integration account.”

So I got out of Logic Apps, and went off to create an Integration Account in the Portal. Integration Accounts are a preview service from Microsoft. These accounts hold all the integration artifacts used in enterprise integration scenarios: schemas, maps, certificates, partners, and trading agreements.

How do I get these artifacts, you ask? This is where client-side development comes in. I downloaded the Enterprise Integration Tools–which is really just Visual Studio extensions that give you the BizTalk schema editor and mapper–and fired up Visual Studio. This adds an “integration” project type to Visual Studio, and also let me add XML schemas, flat file schemas, and maps to a project.

I then set out to build some enterprise-class schemas defining a “person” (one flat file schema, one XML schema) and a map converting one format to another. I built the flat file schema using a sample comma-separated file and the provided Flat File Wizard. Hello, my old friend.

The map is super simple. It just concatenates the inbound fields into a single outbound field in the XML schema. Note that the destination field has a “max occurs” of “*” to make sure that it adds one “name” element for each set of source elements. And yes, the mapper includes the Functoids for basic calculations, logical conditions, and string manipulation.

The Azure Integration Account doesn’t take in DLLs, so I loaded in the raw XSD and map files. Note that you need to build the project to get the XSLT version of the map. The Azure portal doesn’t take the raw .btm map.

Back in my Logic App, I found the Properties page for the app and made sure to set the “integration account” property so that it saw my schemas and maps.

I then went back and spun up the VETER Logic Apps template. Because there seemed to be a lot of places where things could go wrong, I removed all the other shapes from the design canvas and just started with the flat file decoding. Let’s get that working first! Since I associated my “Integration Account” with this Logic App, it was easy to select my schema from the drop-down list. With that, I tested.

Shoot. The first call failed. Fortunately, Logic Apps comes with a pretty sweet dashboard and tracing interface. I noticed that the flat file decoding failed, and it looked like it got angry with my schema defining a carriage-return-plus-line-feed delimiter for records, when all I sent it was a line feed (via my API testing tool). So, I went back to my schema, changed the record delimiter, updated my schema (and map) in the Integration Account, and tested again.

Success! Notice that it turned my input flat file into an XML representation.

Feeling irrationally confident, I went to the Logic Apps design surface, clicked the “templates” button at the top and re-selected the VETER template to get all the activities back that I needed. However, I forgot that the “mapping” activity requires that I have an Azure Functions container set up. Apparently the maps are executed inside Microsoft’s serverless framework, Azure Functions. Microsoft’s docs are pretty cryptic about what to do here, but if you follow the links in this KB (“create container”, “add function”), you get the default mapper template as an Azure Function.

Ok, now I was set. My final Logic App configuration looked like this.

The app takes in a flat file, validates the flat file using the flat file (really, XML) schema, uses a built-in check to see that it’s a decoded flat file, executes my map within an Azure Function, and finally returns the result back. I then called the Logic App from Postman.

BAM! It worked. That’s … awesome. While some of you may have fainted in horror at the idea of using flat files and XML in a shiny new Logic App, this does show that Microsoft is trying to cater to some of the existing constraints of their customers.

Overall, I thought the Logic Apps experience was pretty darn good. The tooling has a few rough edges, but was fairly intuitive. The biggest gap is the documentation and number of public samples, but that’s to be expected with such new technology. I’d definitely recommend giving the Enterprise Integration Pack a try and see what sort of unholy flows you can come up with!

September 9, 2016
Speaking at INTEGRATE 2016 Next Month

Do you live in Europe and have affection for application integration? If so, then I hope to see you at INTEGRATE 2016. This annual conference has become a must-attend event for those interested in connecting apps together, and I’m thrilled that Saravana Kumar asked me to present this year. I’m actually more excited about attending the conference. The organizers have pulled together an enviable collection of speakers, and the attendees are some of the smartest integration people I’ve met.

While this is a Microsoft-heavy event, I decided to bring at outside perspective to the conference this year. Instead of presenting something about Microsoft’s broad integration portfolio, I thought it’d be fun to share the latest happenings in the open source space. I’ll be talking about relevant integration patterns for modern apps and digging into a handful of popular distributed messaging platforms. This should give attendees an up-to-date perspective of the market, and the opportunity to compare open source platforms to Microsoft’s offerings.

I hope to see you there!

April 19, 2016
Where to host your integration bus
RightScale recently announced the results of their annual “State of the Cloud” survey. You can find the report here, and my InfoQ.com story here. A lot of people participated in the survey, and the results showed that a majority of companies are growing their public cloud usage, but continuing to invest heavily in on-premises “cloud” environments. When I was reading this report, I was thinking about the implications on a company’s application integration strategy. As workloads continue to move to cloudy hosts and companies start to get addicted to the benefits of cloud (from the survey: “faster access to infrastructure”, “greater scalability”, “geographic reach”, “higher performance”), does that change what they think about running integration services? What are the options for a company wondering where to host their application/data integration engine, and what benefits and risks are associated with each choice?

The options below should apply whether you’re doing real-time or batch integration, high throughput messaging or complex orchestration, synchronous or asynchronous communication.

Option #1 – Use an Integration-as-a-Service engine in the public cloud

It may make sense to use public cloud integration services to connect your apps. Or, introduce these as edge intake services that still funnel data to another bus further downstream.

Benefits
- Easy to scale up or down. One of the biggest perks of a cloud-based service is that you don’t have to do significant capacity planning up front. For messaging services like Amazon SQS or the Azure Service Bus, there’s very little you have to consider. For an integration service like SnapLogic, there are limits, but you can size up and down as needed. The key is that you can respond to bursts (or troughs) in usage by cutting your costs. No more over-provisioning just in case you might need it.
- Multiple patterns available. You won’t see a glut of traditional ESB-like cloud integration services. Instead, you’ll find many high-throughput messaging (e.g. Google Pub/Sub) or stream processing services (e.g. Azure Stream Analytics) that take advantage of the elasticity of the cloud. However, if you’re doing bulk data movement, there are multiple viable services available (e.g. Talend Integration Cloud), if you’re doing stateful integration there are also services for that (e.g. Azure Logic Apps).
- No upgrade projects. From my experience, IT never likes funding projects that upgrade foundational infrastructure. That’s why you have servers still running Windows Server 2003, or Oracle databases that are 14 versions behind. You always tell yourself that “NEXT year we’ll get that done!” One of the seductive aspects of cloud-based services is that you don’t deal with that any longer. There are no upgrades; new capabilities just show up. And for all these cloud integration services, that means always getting the latest and greatest as soon as it’s available.
- Regular access to new innovations. Is there anything in tech more depressing than seeing all these flashy new features in a product that you use, and knowing that you are YEARS away from deploying it? Blech. The industry is changing so fast, that waiting 4 years for a refresh cycle is an eternity. If you’re using a cloud integration service, then you’re able to get new endpoint adapters, query semantics, storage enhancements and the like as soon as possible.
- Connectivity to cloud hosted systems, partners. One of the key reasons you’d choose a cloud-based integration service is so that you’re closer to your cloudy workloads. Running your web log ingest process, partner supply chain, or master-data management jobs all right next to your cloud-hosted databases and web apps gives you better performance and simpler connectivity. Instead of navigating the 12 layers of firewall hell to expose your on-premises integration service to Internet endpoints, you’re right next door.
- Distributed intake and consumption. Event and data sources are all over the place. Instead of trying to ship all that information to a centralized bus somewhere, it can make sense to do some intake at the edge. Cloud-based services let you spin up multiple endpoints in various geographies with ease, which may give you much more flexibility when taking in Internet-of-Things beacon data, orders from partners, or returning data from time-sensitive request/reply calls.
- Lower operational cost. You MAY end up paying less, but of course you could also end up paying more. Depends on your throughput, storage, etc. But ideally, if you’re using a cloud integration service, you’re not paying the same type of software licensing and hardware costs as you would for an on-premises system.
Risks
- High latency with on-premises systems. Unless your company was formed within the last 18 months, I’d be surprised if you didn’t have SOME key systems sitting in a local facility. While latency may not matter for some asynchronous workloads, if you’re taking in telemetry data from devices and making real-time adjustments to applications, every millisecond counts. Depending on where your home office is, there could be a bit of distance between your cloud-based integration engine and the key systems it talks to.
- Limited connectivity to on-premises systems (bi-directional). It’s usually not too challenging to get on-premises systems to reach out to the Internet (and push data to an endpoint), but it’s another matter to allow data to come *into* your on-premises systems from the Internet. Some integration services have solved this by putting agents on the local environment to facilitate secure communication, but realistically, it’ll be on you to extract data from cloud-based engines versus expecting them to push data into your data centers.
- Experience data leakage if data security isn’t properly factored in. If the data never leaves your private network, it can be easy to be lazy about security. Encrypt in transit? Ok. Encrypt the data as well? Nah. If that casual approach to security isn’t tightened up when you start passing data through cloud integration services, you could find yourself in trouble. While your data may be protected from others accidentally seeing it, you may have made it easy for others within your own organization to extract or tap into data they didn’t have access to before.
- Services are not as mature as software-based products, and focused mostly on messaging. It’s true that cloud-based solutions haven’t been around as long as the Tibcos, BizTalk Servers, and such. And, many cloud-based solutions focus less on traditional integration techniques (FTP! CSV files!) and more on Internet-scale data distribution.
- Opaque operational interfaces make troubleshooting more difficult. We’re talking about as-a-Service products here, so by definition, you’re not running this yourself. That means you can’t check out the server logs, add tracing logic, or view the memory consumption of a particular service. Instead, you only have the interfaces exposed by the vendor. If troubleshooting data is limited, you have no other recourse.
- Limited portability of the configuration between providers. Depending on the service you choose, there’s a level of lock-in that you have to accept. Your integration logic from one service can’t be imported into another. Frankly, the same goes for on-premises integration engines. Either way, your application/data integration platform is probably a key lock-in point regardless of where you host it.
- Unpredictable availability and uptime. A key value proposition of cloud is high availability, but you have to take the provider’s word for it that they’ve architected as such. If your cloud integration bus is offline, so are you. There’s no one to yell at to get it back up and running. Likewise, any maintenance to the platforms happens at a time that works for the vendor, not for you. Ideally you never see downtime, but you absolutely have less control over it.
- Unpredictable pricing on cost dimensions you may not have tracked before (throughput, storage). I’d doubt that most IT shops know their true cost of operations, but nonetheless, it’s possible to get sticker shock when you start paying based on consumption. Once you’ve sunk cost into an on-premises service, you may not care about message throughput or how much data you’re storing. You will care about things like that when using a pay-as-you-go cloud service.
Option #2 – Run your integration engine in a public cloud environment

If adopting an entirely managed public service isn’t for you, then you still may want the elastic foundation of cloud while running your preferred integration engine.

Benefits
- Run the engine of your choice. Like using Mule, BizTalk Server, or Apache Kafka and don’t want to give it up? Take that software and run it on public cloud Infrastructure-as-a-Service. No need to give up your preferred engine just because you want a more flexible host.
- Configuration is portable from on-premises solution (if migrating versus setting this up brand new). If you’re “upgrading” from fixed virtual machines or bare metal boxes to an elastic cloud, the software stays the same. In many cases, you don’t have to rewrite much (besides some endpoint addresses) in order to slide into an environment where you can resize the infrastructure up and down much easier.
- Scale up and down compute and storage. Probably the number one reason to move. Stop worrying about boxes that are too small (or large!) and running out of disk space. By moving from fixed on-premises environments to self-service cloud infrastructure, you can set an initial sizing and continue to right-size on a regular basis. About to beat the hell out of your RabbitMQ environment for a few days? Max out the capacity so that you can handle the load. Elasticity is possibly the most important reason to adopt cloud.
- Stay close to cloud hosted systems. Your systems are probably becoming more distributed, not more centralized. If you’re seeing a clear trend towards moving to cloud applications, then it may make sense to relocate your integration bus to be closer to them. And if you’re worried about latency, you could choose to run smaller edge instances of your integration bus that feed data to a centralized one. You have much more flexibility to introduce such an architecture when capacity is available anywhere, on-demand.
- Keep existing tools and skillsets around that engine. One challenge that you may have when adopting an integration-as-a-service product is the switching costs. Not only are you rebuilding your integration scenarios in a new product, but you’re also training up staff on an entirely new toolset. If you keep your preferred engine but move it to the public cloud, there are no new training costs.
- Low level troubleshooting available. If problems pop up – and of course they will – you have access to all the local logs, services, and configurations that you did before. Integration solutions are notoriously tricky to debug given the myriad locations where something could have gone amiss. The more data, the better.
- Experience easier integration scenarios with partners. You may love using BizTalk’s Trading Partner Management capabilities, but don’t like wrangling with network and security engineers to expose the right endpoints from your on-premises environment. If you’re running the same technology in the public cloud, you’ll have a simpler time securely exposing select endpoints and ports to key partners.
Risks
- Long distance from integrated systems. Like the risk in the section above, there’s concern that shifting your integration engine to the public cloud will mean taking it away from where all the apps are. Does the enhanced elasticity make up for the fact that your business data now has to leave on-premises systems and travel to a bus sitting miles away?
- Connectivity to on-premises systems. If your cloud virtual machines can’t reach your on-premises systems, you’re going to have some awkward integration scenarios. This is where Infrastructure-as-a-Service can be a little more flexible than cloud integration services because it’s fairly easy to set up a persistent, secure tunnel between cloud IaaS networks and on-premises networks. Not so easy to do with cloud messaging services.
- There’s a larger attack surface if engine has public IP connectivity. You may LIKE that your on-premises integration bus is hard to reach! Would-be attackers must breach multiple zones in order to attack this central nervous system of your company. By moving your integration engine to the cloud and opening up ports for inbound access, you’re creating a tempting target for those wishing to tap into this information-rich environment.
- Not getting any of the operation benefits that as-a-service products possess. One of the major downsides of this option is that you haven’t actually simplified much; you’re just hosting your software elsewhere. Instead of eliminating infrastructure headaches and focusing on connecting your systems, you’re still standing up (virtual) infrastructure, configuring networks, installing software, managing software updates, building highly available setups, and so on. You may be more elastic, but you haven’t reduced your operational burden.
- Few built in connectivity to cloudy endpoints. If you’re using an integration service that comes with pre-built endpoint adapters, you may find that traditional software providers aren’t keeping up with “cloud born” providers. SnapLogic will always have more cloud connectivity than BizTalk Server, for example. You may not care about this if you’re dealing with messaging engines that require you to write producer/consumer code. But for those that like having pre-built connectors to systems (e.g. IFTTT), you may be disappointed with your existing software provider.
- Availability and uptime, especially if the integration engine isn’t cloud-native. If you move your integration engine to cloud IaaS, it’s completely on you to ensure that you’ve got a highly available setup. Running ZeroMQ on a single cloud virtual machine isn’t going to magically provide a resilient back end. If you’re taking a traditional ESB product and running it in cloud VMs, you still likely can’t scale out as well as cloud-friendly distributed engines like Kafka or NATS.
Option #3 – Run your integration engine on-premises

Running an integration engine in the cloud may not be for you. Even if your applications are slowly (quickly?) moving to the cloud, you might want to keep your integration bus put.

Benefits
- Run the engine of your choice. No one can tell you what to do in your own house! Pick the ESB, messaging engine, or ETL tool that works for you.
- Control the change and maintenance lifecycle. This applies to option #2 to some extent, but when you control the software to the metal, you can schedule maintenance at optimal times and upgrade the software on your own timetable. If you’ve got a sensitive Big Data pipeline and want to reboot Spark ONLY when things are quiet, then you can do that.
- Close to all on-premises systems. Plenty of workloads are moving to public cloud, but it’s sure as heck not all of them. Or at least right now. You may be seeing commodity services like CRM or HR quickly going to cloud services, but lots of mission critical apps still sit within your data centers. Depending on what your data sources are, you may have a few years before you’re motivated to give your integration engine a new address.
- You can still reach out to Internet endpoints, while keeping inbound ports closed. If you’re running something like BizTalk Server, you can send data to cloud endpoints, and even receive data in (through the Service Bus) without exposing the service to the Internet. And if you’re using messaging engines where you write the endpoints, it may not really matter if the engine is on-site.
- Can get some elasticity through private clouds. Don’t forget about private clouds! While some may think private clouds are dumb (because they don’t achieve the operational benefits or elasticity of a public cloud), the reality is that many companies have doubled down on them. If you take your preferred integration engine and slide it over to your private cloud, you may get some of the elasticity and self-service benefits that public cloud customers get.
Risks
- Difficult to keep up to date with latest versions. As the pace of innovation and disruption picks up, you may find it hard to keep your backbone infrastructure up to date. By continuing to own the lifecycle of your integration software, you run the risk of falling behind. That may not matter if you like the version of the software that you are on – or if you have gotten great at building out new instances of your engines and swapping consumers over to them – but it’s still something that can cause problems.
- Subject to capacity limitations and slow scale up/out. Private clouds rarely have the same amount of hardware capacity that public clouds do. So even if you love dropping RabbitMQ into your private cloud, there may not be the storage or compute available when you need to quickly expand.
- Few native connectors to cloudy endpoints. Sticking with traditional software may mean that you stay stuck on a legacy foundation instead of adopting a technology that’s more suited to connecting cloud endpoints or high-throughput producers.
There’s no right or wrong answer here. Each company will have different reasons to choose an option above (or one that I didn’t even come up with!). If you’re interested in learning more about the latest advances in the messaging space, join me at the Integrate 2016 event (pre-registration here) in London on May 12-13. I’ll be doing a presentation on what’s new in the open source messaging space, and how increasingly popular integration patterns have changed our expectations of what an integration engine should be able to do.
March 8, 2016