Richard Seroter's Architecture Musings

Category: Microsoft Azure

Using Concourse to continuously deliver a Service Bus-powered Java app to Pivotal Cloud Foundry on Azure
Guess what? Deep down, cloud providers know you’re not moving your whole tech portfolio to their public cloud any time soon. Oh, your transition is probably underway, but you’ve got a whole stash of apps, data stores, and services that may not move for a while. That’s cool. There are more and more patterns and services available to squeeze value out of existing apps by extending them with more modern, scalable, cloudy tech. For instance, how might you take an existing payment transfer system that did B2B transactions and open it up to consumers without requiring your team to do a complete rewrite? One option might be to add a load-leveling queue in front of it, and take in requests via a scalable, cloud-based front-end app. In this post, I’ll show you how to implement that pattern by writing a Spring Boot app that uses Azure Service Bus Queues. Then, I’ll build a Concourse deployment pipeline to ship the app to Pivotal Cloud Foundry running atop Microsoft Azure.

Ok, but why use a platform on top of Azure?

That’s a fair question. Why not just use native Azure (or AWS, or Google Cloud Platform) services instead of putting a platform overlay like Pivotal Cloud Foundry atop it? Two reasons: app-centric workflow for developers, and “day 2” operations at scale.

Most every cloud platform started off by automating infrastructure. That’s their view of the world, and it still seeps into most of their cloud app services. There’s no fundamental problem with that, except that many developers (“full stack” or otherwise) aren’t infrastructure pros. They want to build and ship great apps for customers. Everything else is a distraction. A platform such as Pivotal Cloud Foundry is entirely application-focused. Instead of the developer finding an app host, packaging the app, deploying the app, setting up a load balancer, configuring DNS, hooking up log collection, and configuring monitoring, the Cloud Foundry dev just cranks out an app and does a single action to get everything correctly configured in the cloud. And it’s an identical experience whether Pivotal Cloud Foundry is deployed to Azure, AWS, OpenStack, or whatever. The smartest companies realized that their developers should be exceptional at writing customer-facing software, not configuring firewall rules and container orchestration.

Secondly, it’s about “day 2” operations. You know, all the stuff that happens to actually maintain apps in production. I have no doubt that any of you can build an app and quickly get it to cloud platforms like Azure Web Sites or Heroku with zero trouble. But what about when there are a dozen apps, or thousands? How about when it’s not just you, but a hundred of your fellow devs? Most existing app-centric platforms just aren’t set up to be org-wide, and you end up with costly inconsistencies between teams. With something like Pivotal Cloud Foundry, you have a resilient, distributed system that supports every major programing language, and provides a set of consistent patterns for app deployment, logging, scaling, monitoring, and more. Some of the biggest companies in the world deploy thousands of apps to their respective environments today, and we just proved that the platform can handle 250,000 containers with no problem. It’s about operations at scale.

With that out of the way, let’s see what I built.

Step 1 – Prerequisites

Before building my app, I had to set up a few things.
- Azure account. This is kind of important for a demo of things running on Azure. Microsoft provides a free trial, so take it for a spin if you haven’t already. I’ve had my account for quite a while, so all my things for this demo hang out there.
- GitHub account. The Concourse continuous integration software knows how to talk to a few things, and git is one of them. So, I stored my app code in GitHub and had Concourse monitoring it for changes.
- Amazon account. I know, I know, an Azure demo shouldn’t use AWS. But, Amazon S3 is a ubiquitous object store, and Concourse made it easy to drop my binaries there after running my continuous integration process.
- Pivotal Cloud Foundry (PCF). You can find this in the Azure marketplace, and technically, this demo works with PCF running anywhere. I’ve got a full PCF on Azure environment available, and used that here.
- Azure Service Broker. One fundamental concept in Cloud Foundry is a “service broker.” Service brokers advertise a catalog of services to app developers, and provide a consistent way to provision and de-provision the service. They also “bind” services to an app, which puts things like service credentials into that app’s environment variables for easy access. Microsoft built a service broker for Azure, and it works for DocumentDB, Azure Storage, Redis Cache, SQL Database, and the Service Bus. I installed this into my PCF-on-Azure environment, but you can technically run it on any PCF installation.
Step 2 – Build Spring Boot App

In my fictitious example, I wanted a Java front-end app that mobile clients interact with. That microservice drops messages into an Azure Service Bus Queue so that the existing on-premises app can pull messages from at their convenience, and thus avoid getting swamped by all this new internet traffic.

Why Java? Java continues to be very popular in enterprises, and Spring Boot along with Spring Cloud (both maintained by Pivotal) have completely modernized the Java experience. Microsoft believes that PCF helps companies get a first-class Java experience on Azure.

.@seanmckmsft says that @pivotalcf fills in an important @Azure gap: a premium experience for Java and @springboot devs. pic.twitter.com/u8hI7zZrjR

— Richard Seroter (@rseroter) November 10, 2016

I used Spring Tool Suite to build a new Spring Boot MVC app with “web” and “thymeleaf” dependencies. Note that you can find all my code in GitHub if you’d like to reproduce this.

To start with, I created a model class for the web app. This “web payment” class represents the data I connected from the user and passed on to the Service Bus Queue.
```
package seroter.demo;

public class WebPayment {
	private String fromAccount;
	private String toAccount;
	private long transferAmount;

	public String getFromAccount() {
		return fromAccount;
	}

	public void setFromAccount(String fromAccount) {
		this.fromAccount = fromAccount;
	}

	public String getToAccount() {
		return toAccount;
	}

	public void setToAccount(String toAccount) {
		this.toAccount = toAccount;
	}

	public long getTransferAmount() {
		return transferAmount;
	}

	public void setTransferAmount(long transferAmount) {
		this.transferAmount = transferAmount;
	}
}
```
Next up, I built a bean that my web controller used to talk to the Azure Service Bus. Microsoft has an official Java SDK in the Maven repository, so I added this to my project.

Within this object, I referred to the VCAP_SERVICES environment variable that I would soon get by binding my app to the Azure service. I used that environment variable to yank out the credentials for the Service Bus namespace, and then created the queue if it didn’t exist already.
```
@Configuration
public class SbConfig {

 @Bean
 ServiceBusContract serviceBusContract() {

   //grab env variable that comes from binding CF app to the Azure service
   String vcap = System.getenv("VCAP_SERVICES");

   //parse the JSON in the environment variable
   JsonParser jsonParser = JsonParserFactory.getJsonParser();
   Map<String, Object> jsonMap = jsonParser.parseMap(vcap);

   //create map of values for service bus creds
   Map<String,Object> creds = (Map<String,Object>)((List<Map<String, Object>>)jsonMap.get("seroter-azureservicebus")).get(0).get("credentials");

   //create service bus config object
   com.microsoft.windowsazure.Configuration config =
	ServiceBusConfiguration.configureWithSASAuthentication(
		creds.get("namespace_name").toString(),
		creds.get("shared_access_key_name").toString(),
		creds.get("shared_access_key_value").toString(),
		".servicebus.windows.net");

   //create object used for interacting with service bus
   ServiceBusContract svc = ServiceBusService.create(config);
   System.out.println("created service bus contract ...");

   //check if queue exists
   try {
	ListQueuesResult r = svc.listQueues();
	List<QueueInfo> qi = r.getItems();
	boolean hasQueue = false;

	for (QueueInfo queueInfo : qi) {
          System.out.println("queue is " + queueInfo.getPath());

	  //queue exist already?
	  if(queueInfo.getPath().equals("demoqueue"))  {
		System.out.println("Queue already exists");
		hasQueue = true;
		break;
	   }
	 }

	if(!hasQueue) {
	//create queue because we didn't find it
	  try {
	    QueueInfo q = new QueueInfo("demoqueue");
            CreateQueueResult result = svc.createQueue(q);
	    System.out.println("queue created");
	  }
	  catch(ServiceException createException) {
	    System.out.println("Error: " + createException.getMessage());
	  }
        }
    }
    catch (ServiceException findException) {
       System.out.println("Error: " + findException.getMessage());
     }
    return svc;
   }
}
```
Cool. Now I could connect to the Service Bus. All that was left was my actual web controller that returned views, and sent messages to the Service Bus. One of my operations returned the data collection view, and the other handled form submissions and sent messages to the queue via the @autowired ServiceBusContract object.
```
@SpringBootApplication
@Controller
public class SpringbootAzureConcourseApplication {

   public static void main(String[] args) {
     SpringApplication.run(SpringbootAzureConcourseApplication.class, args);
   }

   //pull in autowired bean with service bus connection
   @Autowired
   ServiceBusContract serviceBusContract;

   @GetMapping("/")
   public String showPaymentForm(Model m) {

      //add webpayment object to view
      m.addAttribute("webpayment", new WebPayment());

      //return view name
      return "webpayment";
   }

   @PostMapping("/")
   public String paymentSubmit(@ModelAttribute WebPayment webpayment) {

      try {
         //convert webpayment object to JSON to send to queue
	 ObjectMapper om = new ObjectMapper();
	 String jsonPayload = om.writeValueAsString(webpayment);

	 //create brokered message wrapper used by service bus
	 BrokeredMessage m = new BrokeredMessage(jsonPayload);
	 //send to queue
	 serviceBusContract.sendMessage("demoqueue", m);
	 System.out.println("message sent");

      }
      catch (ServiceException e) {
	 System.out.println("error sending to queue - " + e.getMessage());
      }
      catch (JsonProcessingException e) {
	 System.out.println("error converting payload - " + e.getMessage());
      }

      return "paymentconfirm";
   }
}
```
With that, my microservice was done. Spring Boot makes it silly easy to crank out apps, and the Azure SDK was pretty straightforward to use.

Step 3 – Deploy and Test App

Developers use the “cf” command line interface to interact with Cloud Foundry environments. Running a “cf marketplace” command shows all the services advertised by registered service brokers. Since I added the Azure Service Broker to my environment, I instantiated an instance of the Service Bus service to my Cloud Foundry org. To tell the Azure Service Broker what to actually create, I built a simple JSON document that outlined the Azure resource group. region, and service.
```
{
  "resource_group_name": "pivotaldemorg",
  "namespace_name": "seroter-boot",
  "location": "westus",
  "type": "Messaging",
  "messaging_tier": "Standard"
}
```
By using the Azure Service Broker, I didn’t have to go into the Azure Portal for any reason. I could automate the entire lifecycle of a native Azure service. The command below created a new Service Bus namespace, and made the credentials available to any app that binds to it.
```
cf create-service seroter-azureservicebus default seroterservicebus -c sb.json
```
After running this, my PCF environment had a service instance (seroterservicebus) ready to be bound to an app. I also confirmed that the Azure Portal showed a new namespace, and no queues (yet).

Awesome. Next, I added a “manifest” that described my Cloud Foundry app. This manifest specified the app name, how many instances (containers) to spin up, where to get the binary (jar) to deploy, and which service instance (seroterservicebus) to bind to.
```
---
applications:
- name: seroter-boot-azure
  memory: 256M
  instances: 2
  path: target/springboot-azure-concourse-0.0.1-SNAPSHOT.jar
  buildpack: https://github.com/cloudfoundry/java-buildpack.git
  services:
    - seroterservicebus
```
By doing a “cf push” to my PCF-on-Azure environment, the platform took care of all the app packaging, container creation, firewall updates, DNS changes, log setup, and more. After a few seconds, I had a highly-available front end app bound to the Service Bus. Below that you can see I had an app started with two instances, and the service was bound to my new app.

All that was left was to test it. I fired up the app’s default view, and filled in a few values to initiate a money transfer.

After submitting, I saw that there was a new message in my queue. I built another Spring Boot app (to simulate an extension of my legacy “payments” system) that pulled from the queue. This app ran on my desktop and logged the message from the Azure Service Bus.

That’s great. I added a mature, highly-available queue in between my cloud-native Java web app, and my existing line-of-business system. With this pattern, I could accept all kinds of new traffic without overloading the backend system.

Step 4 – Build Concourse Pipeline

We’re not done yet! I promised continuous delivery, and I deliver on my promises, dammit.

To build my deployment process, I used Concourse, a pipeline-oriented continuous integration and delivery tool that’s easy to use and amazingly portable. Instead of wizard-based tools that use fixed environments, Concourse uses pipelines defined in configuration files and executed in ephemeral containers. No conflicts with previous builds, no snowflake servers that are hard to recreate. And, it has a great UI that makes it obvious when there are build issues.

I downloaded a Vagrant virtual machine image with Concourse pre-configured. Then I downloaded the lightweight command line interface (called Fly) for interacting with pipelines.

My “build and deploy” process consisted of four files: bootpipeline.yml that contained the core pipeline, build.yml which set up the Java build process, build.sh which actually performs the build, and secure.yml which holds my credentials (and isn’t checked into GitHub).

The build.sh file clones my GitHub repo (defined as a resource in the main pipeline) and does a maven install.
```
#!/usr/bin/env bash

set -e -x

git clone resource-seroter-repo resource-app

cd resource-app

mvn clean

mvn install
```
The build.yml file showed that I’m using the Maven Docker image to build my code, and points to the build.sh file to actually build the app.
```
---
platform: linux

image_resource:
  type: docker-image
  source:
    repository: maven
    tag: latest

inputs:
  - name: resource-seroter-repo

outputs:
  - name: resource-app

run:
  path: resource-seroter-repo/ci/build.sh
```
Finally, let’s look at my build pipeline. Here, I defined a handful of “resources” that my pipeline interacts with. I’ve got my GitHub repo, an Amazon S3 bucket to store the JAR file, and my PCF-on-Azure environment. Then, I have two jobs: one that builds my code and puts the result into S3, and another that takes the JAR from S3 (and manifest from GitHub) and pushes to PCF on Azure.
```
---
resources:
# resource for my GitHub repo
- name: resource-seroter-repo
  type: git
  source:
    uri: https://github.com/rseroter/springboot-azure-concourse.git
    branch: master
#resource for my S3 bucket to store the binary
- name: resource-s3
  type: s3
  source:
    bucket: spring-demo
    region_name: us-west-2
    regexp: springboot-azure-concourse-(.*).jar
    access_key_id: {{s3-key-id}}
    secret_access_key: {{s3-access-key}}
# resource for my Cloud Foundry target
- name: resource-azure
  type: cf
  source:
    api: {{cf-api}}
    username: {{cf-username}}
    password: {{cf-password}}
    organization: {{cf-org}}
    space: {{cf-space}}

jobs:
- name: build-binary
  plan:
    - get: resource-seroter-repo
      trigger: true
    - task: build-task
      privileged: true
      file: resource-seroter-repo/ci/build.yml
    - put: resource-s3
      params:
        file: resource-app/target/springboot-azure-concourse-0.0.1-SNAPSHOT.jar

- name: deploy-to-prod
  plan:
    - get: resource-s3
      trigger: true
      passed: [build-binary]
    - get: resource-seroter-repo
    - put: resource-azure
      params:
        manifest: resource-seroter-repo/manifest-ci.yml
```
I was now ready to deploy my pipeline and see the magic.

After spinning up the Concourse Vagrant box, I hit the default URL and saw that I didn’t have any pipelines. NOT SURPRISING.

From my Terminal, I used Fly CLI commands to deploy a pipeline. Note that I referred again to the “secure.yml” file containing credentials that get injected into the pipeline definition at deploy time.
```
fly -t lite set-pipeline --pipeline azure-pipeline --config bootpipeline.yml --load-vars-from secure.yml
```
In a second or two, a new (paused) pipeline popped up in Concourse. As you can see below, this tool is VERY visual. It’s easy to see how Concourse interpreted my pipeline definition and connected resources to jobs.

I then un-paused the pipeline with this command:
```
fly -t lite unpause-pipeline --pipeline azure-pipeline
```
Immediately, the pipeline started up, retrieved my code from GitHub, built the app within a Docker container, dropped the result into S3, and deployed to PCF on Azure.

After Concourse finished running the pipeline, I checked the PCF Application Manager UI and saw my new app up and running. Think about what just happened: I didn’t have to muck with any infrastructure or open any tickets to get an app from dev to production. Wonderful.

The way I built this pipeline, I didn’t version the JAR when I built my app. In reality, you’d want to use the semantic versioning resource to bump the version on each build. Because of the way I designed this, the second job (“deploy to PCF”) won’t fire automatically after the first build, since there technically isn’t a new artifact in the S3 bucket. A cool side effect of this is that I could constantly do continuous integration, and then choose to manually deploy (clicking the “+” button below) when the company was ready for the new version to go to production. Continuous delivery, not deployment.

Wrap Up

Whew. That was a big demo. But in the scheme of things, it was pretty straightforward. I used some best-of-breed services from Azure within my Java app, and then pushed that app to Pivotal Cloud Foundry entirely through automation. Now, every time I check in a code change to GitHub, Concourse will automatically build the app. When I choose to, I take the latest build and tell Concourse to send it to production.

A platform like PCF helps companies solve their #1 problem with becoming software-driven: improving their deployment pipeline. Try to keep your focus on apps not infrastructure, and make sure that whatever platform you use, you focus on sustainable operations at scale!
November 28, 2016
Trying out the “standard” and “enterprise” templates in Azure Logic Apps

Is the Microsoft integration team “back”? It might be premature to say that Microsoft has finally figured out its app integration story, but the signs are very positive. There’s been a fresh influx of talent like Jon Fancey, Tord Glad Nordahl, and Jim Harrer, some welcome forethought into the overall Microsoft integration story, better community engagement, and a noticeable uptick in the amount of software released by these teams.

One area that’s been getting tons of focus in Azure Logic Apps. Logic Apps are a potential successor to classic on-premises application integration tools, but with a cloud-first bent. Users can visually model flows made up of built-in, or custom, activities. The initial integrations supported by Logic Apps were focused on cloud endpoints, but with the recent beta release of the Enterprise Integration Pack, Microsoft is making its move to more traditional use cases. I haven’t messed around with Logic Apps for a few months, and lots of things have changed, so I tested out both the standard and enterprise templates.

One nice thing about things like Logic Apps is that anyone can get started with just a browser. If you’re building a standard workflow (read: doesn’t require extra services or the “enterprise integration” bits), then you don’t have to install a single thing. To start with, I went the Azure Portal (the new one, not the classic one), and created a new “Logic App.”

I was then presented with a choice for how to populate the app itself. There’s the default “blank” template, or, I can start off with a few pre-canned options. Some of these are a bit contrived (“save my tweets to a SharePoint list” makes me sad), but they give you a good idea of what’s possible with the many built-in connectors.

I chose the HTTP Request-Response template since my goal was to build a simple synchronous web service. The portal showed me what this template does, and dropped me into the design canvas with the HTTP Request and HTTP Response activities in place.

I have a birthday coming and am feeling old, so I decided to build a simple service that would tell me if I was old or not. In order to easily use the fields of an inbound JSON message, I had to define a simple JSON schema inside the HTTP Request shape. This schema defines a string for the “name” and an integer for the “age.”

Before sending a response, I want to actually do something! So, I added an if-then condition to the canvas. There are other conditionals available, such as for-each and do-until. I put this if-then shape in between the Request and Response elements, and was able to choose the “age” value for my conditional check.

Here, I checked to see if “age” is greater than 40. Notice that I also had access to the “name” field, as well as the whole request body or HTTP headers. Next, I wanted to send a different HTTP response for over-40, and under-40. The brand new “compose” activity is the answer. With this, I could create a new message to send back in the HTTP response.

I simply typed a new JSON message into the Compose activity, using the variable for the “name”, and adding some text to categorize the requestor’s age.

I then did the same thing for the “no” path of the if-then and had a complete flow!

Quick and easy! The topmost HTTP Receive activity has the URL for this particular Logic App, and since I didn’t apply any security policies, it was super simple to invoke. From within my favorite API testing tool, Postman, I submitted a JSON message to the endpoint. Sure enough, I got back a response that corresponded to the provided age.

Great. But what about doing all the Enterprisey stuff? I built another new Logic App, and this time, wanted to send a comma separated payload to an HTTP endpoint and get back XML. There’s a Logic Apps template for that and when I selected it, I was told I needed an “integration account.”

So I got out of Logic Apps, and went off to create an Integration Account in the Portal. Integration Accounts are a preview service from Microsoft. These accounts hold all the integration artifacts used in enterprise integration scenarios: schemas, maps, certificates, partners, and trading agreements.

How do I get these artifacts, you ask? This is where client-side development comes in. I downloaded the Enterprise Integration Tools–which is really just Visual Studio extensions that give you the BizTalk schema editor and mapper–and fired up Visual Studio. This adds an “integration” project type to Visual Studio, and also let me add XML schemas, flat file schemas, and maps to a project.

I then set out to build some enterprise-class schemas defining a “person” (one flat file schema, one XML schema) and a map converting one format to another. I built the flat file schema using a sample comma-separated file and the provided Flat File Wizard. Hello, my old friend.

The map is super simple. It just concatenates the inbound fields into a single outbound field in the XML schema. Note that the destination field has a “max occurs” of “*” to make sure that it adds one “name” element for each set of source elements. And yes, the mapper includes the Functoids for basic calculations, logical conditions, and string manipulation.

The Azure Integration Account doesn’t take in DLLs, so I loaded in the raw XSD and map files. Note that you need to build the project to get the XSLT version of the map. The Azure portal doesn’t take the raw .btm map.

Back in my Logic App, I found the Properties page for the app and made sure to set the “integration account” property so that it saw my schemas and maps.

I then went back and spun up the VETER Logic Apps template. Because there seemed to be a lot of places where things could go wrong, I removed all the other shapes from the design canvas and just started with the flat file decoding. Let’s get that working first! Since I associated my “Integration Account” with this Logic App, it was easy to select my schema from the drop-down list. With that, I tested.

Shoot. The first call failed. Fortunately, Logic Apps comes with a pretty sweet dashboard and tracing interface. I noticed that the flat file decoding failed, and it looked like it got angry with my schema defining a carriage-return-plus-line-feed delimiter for records, when all I sent it was a line feed (via my API testing tool). So, I went back to my schema, changed the record delimiter, updated my schema (and map) in the Integration Account, and tested again.

Success! Notice that it turned my input flat file into an XML representation.

Feeling irrationally confident, I went to the Logic Apps design surface, clicked the “templates” button at the top and re-selected the VETER template to get all the activities back that I needed. However, I forgot that the “mapping” activity requires that I have an Azure Functions container set up. Apparently the maps are executed inside Microsoft’s serverless framework, Azure Functions. Microsoft’s docs are pretty cryptic about what to do here, but if you follow the links in this KB (“create container”, “add function”), you get the default mapper template as an Azure Function.

Ok, now I was set. My final Logic App configuration looked like this.

The app takes in a flat file, validates the flat file using the flat file (really, XML) schema, uses a built-in check to see that it’s a decoded flat file, executes my map within an Azure Function, and finally returns the result back. I then called the Logic App from Postman.

BAM! It worked. That’s … awesome. While some of you may have fainted in horror at the idea of using flat files and XML in a shiny new Logic App, this does show that Microsoft is trying to cater to some of the existing constraints of their customers.

Overall, I thought the Logic Apps experience was pretty darn good. The tooling has a few rough edges, but was fairly intuitive. The biggest gap is the documentation and number of public samples, but that’s to be expected with such new technology. I’d definitely recommend giving the Enterprise Integration Pack a try and see what sort of unholy flows you can come up with!

September 9, 2016
Enterprises fighting back, Spring Boot is the best, and other SpringOne Platform takeaways
Last week I was in Las Vegas for SpringOne Platform. This conference had one of the greatest session lists I’ve ever seen, and brought together nearly 2,000 people interested in microservices, Java Spring, DevOps, agile, Cloud Foundry, and cloud-native development. With sponsors like Google, Microsoft, HortonWorks, Accenture, and AWS, and over 400 different companies represented by attendees, the conference had a unique blend of characters. I spent some time reflecting on the content and vibe of SpringOne Platform, and noticed that I kept coming back to the following themes.

#1 – Enterprises are fighting back.

Finally! Large, established companies are tired of operating slow-moving, decrepit I.T. departments where nothing interesting happens. At SpringOne Platform, I saw company after company talking about how they are creating change, and then showing the results. Watch this insightful keynote from Citi where they outline pain points, and how they’ve changed their team structure, culture, and technology:

You don’t have to work at Uber, Etsy, Netflix or AWS to work on cutting-edge technology. Enterprises have woken up to the fact that outsourcing their strategic technology skills was a dumb decision. What are they doing to recover?
1. Newfound focus on hiring and expanding technology talent. In just about every enterprise-led session I attended, the presentation closed with a “we’re hiring!” notice. Netflix has been ending their blog posts with this call-to-action for YEARS. Enterprises are starting to sponsor conferences and go where developers hang out. Additionally, because you can’t just hire hundreds of devs that know cloud-native patterns, I’m seeing enterprises make a greater investment in their existing people. That’s one reason Pluralsight continues to explode in popularity as enterprises purchase subscriptions for all their tech teams.
2. Upgrading and investing in technology. Give the devs what they want! Enterprises have started to realize that classic enterprise technology doesn’t attract talented people to work on it. Gartner predicts that by the year 2020, 75% of the apps supporting digital business will be built, not bought. That means that your dev teams need the tools and tech that let them crank out customer-centric, resilient apps. And they need support for using modern approaches to delivering software. If you invest in technology, you’ll attract the talent to work with it.
Bloomberg gives devs tools they need. And freedom to use and contribute to OSS. Their GitHub repo is here: https://t.co/83il8KklGn #s1p
— Richard Seroter (@rseroter) August 3, 2016

#2 – Spring Boot is the best application bootstrapping experience, period.

For 17+ years I’ve either coded in .NET or Node.js (with a little experimentation in Go, Ruby, and Java). After joining Pivotal, I decided that I should learn Spring, since that’s our jam.

I’ve never seen anything better than Spring Boot for getting developers rolling. Instead of spending hours (days?) setting up boilerplate code, and finding the right mix of dependencies for your project, Spring Boot takes care of all that. Give me 4 minutes, and I can build and deploy a git-backed Configuration Server. In a few moments I can flip on OAuth2 security or distributed tracing. And this isn’t hello-world quality stuff; this is the productization of Netflix OSS and other battle tested technology that you can use with simple code annotations. That’s amazing, and you can use the Spring Initializer to get started today.

Smart companies realize that devs shouldn’t be building infrastructure, app scaffolding or wrangling dependencies; they should be creating user experiences and business logic. Whereas Node.js has a billion packages and I spend plenty of time selecting ones that don’t have Guy Fieri images embedded, Spring Boot gives devs a curated, integrated set of packages. And it’s saving companies like Comcast, millions of dollars.

I agree the Spring Eco-system I am projecting will save us millions I can say that with confidence
— James Taylor (@jctbmwi8) August 7, 2016

Presenter after presenter at SpringOne Platform were able to quickly demonstrate complex distributed systems concepts by using Spring Boot apps. Java innovation happens in Spring.

#3 A wave of realism has swept over the industry.

I’m probably being optimistic, but it seems like some of the hype is settling down, and we’re actually getting to work on transformation. The SpringOne Platform talks (both in sessions, and hallway/lunch conversations) weren’t about visions of the future, but actual in-progress efforts. Transformation is hard and there aren’t shortcuts. Simply containerizing won’t make a difference, for example.

Stop pretending that things work the same wherever they run (e.g. cloud). It takes work, says @caseywest. #s1p pic.twitter.com/sUZQQcv55f
— Richard Seroter (@rseroter) August 2, 2016

Talk after talk, conducted by analysts or customers, highlighted the value of assessing your existing app portfolio, and identifying where refactoring or replatforming can add value. Just lifting and shifting to a container orchestration platform doesn’t actually improve things. At best, you’ve optimized the infrastructure, while ignoring the real challenge: improving the delivery pipeline. Same goes for configuration management, and other technologies that don’t establish meaningful change. It takes a mix of cultural overhaul, management buy-in, and yes, technology. I didn’t see anyone at the conference promising silver bullets. But at the same time, there were some concrete next steps for teams looking for accelerate their efforts.

#4 The cloud wars have officially moved above IaaS.

IaaS is definitely not a commodity (although pricing has stabilized), but you’re seeing the major three clouds working hard to own the services layer above the raw infrastructure. Gartner’s just-released IaaS Magic Quadrant shows clear leadership by AWS, Microsoft, and Google, and not accidentally, all three sponsored SpringOne Platform. Google brought over 20 people to the conference, and still couldn’t handle the swarms of people at their booth trying out Spring Boot! An integrated platform on top of leading clouds gives the best of all worlds.

By the end of the year, you’ll be able to consume Google data services natively on @pivotalcf. @jjhollywood to @wattersjames #springone #s1p
— Bridget Kromhout (@bridgetkromhout) August 3, 2016

Great infrastructure matters, but native services in the cloud are becoming the key differentiator for one over another. Want services to bridge on-premises and cloud apps? Azure is a strong choice. Need high performing data storage services? AWS is fantastic. Looking at next generation machine learning and data processing? Google is bleeding edge. At SpringOne Platform, I heard established companies—including Home Depot, the GAP, Merrill Corp—explain why the loved Pivotal Cloud Foundry, especially when it integrated with native services in their cloud of choice. The power of platforms, baby.

#5 Data microservices is the next frontier.

I love, love that we’re talking about the role of data in a microservices world. It’s one thing to design and deliver stateless web apps, and scale the heck out of them. We’ve got lots of patterns for that. But what about the data? Are there ways to deploy and manage data platforms with extreme automation? How about scaling real-time and batch data processing? There were tons of sessions about data at SpringOne Platform, and Pivotal’s Data team wrote up some awesome summaries throughout the week:
- Reflections from a “Data” Guy on Day 1 at SpringOne Platform 2016 by Jeff Kelly
- My Understanding of Database Provisioning and Deployment with Pivotal Cloud Foundry by Cesar Rojas
- Happy Developers are Productive Developers – A Lesson for Data Pros: Day 2 at SpringOne Platform by Jeff Kelly
- Data is a Journey, not a Destination. by Dan Baskette
- Cloud Native Architectures and the rise of Data Microservices by Fred Melo
It’s almost always about data, and I think it’s great that we had PACKED sessions full of people working through these emerging ideas.

#6 Pivotal is making a difference.

I’m very proud of what our customers are doing with the help of Pivotal people and technologies. While we tried to make sure we didn’t beat people over the head with “Pivotal is GREAT” stuff, it became clear that the “Pivotal Way” is working and transforming the how the largest companies in the world build software.

i've never been to a major vendor conference that had anything like this level of customer commitment to developer experience. #sp1 @pivotal
— we're done here (@monkchips) August 3, 2016

The Gap talked about going from weeks to deploy code changes, to mere minutes. That has a material impact on how they interact with their customers. And for many, this isn’t about net new applications. Almost everyone who presented talked about how to approach existing investments and find new value. It’s fun to be on this journey to simplify the future.

Want to help make a difference at Pivotal and drive the future of software? We’re always hiring.
August 9, 2016
Who is really supposed to use the (multi)cloud GUI?
How do YOU prefer to interact with infrastructure clouds? A growing number of people seem to prefer APIs, SDKs, and CLIs over any graphical UI. It’s easy to understand why: few GUIs offer the ability to create the repeatable, automated processes needed to use compute at scale. I just wrote up an InfoQ story about a big update to the AWS EC2 Run Command feature—spoiler: you can now execute commands against servers located ANYWHERE—and it got me thinking about how we interact with resources. In this post, I’ll try and figure out who cares about GUIs, and, show off an example of the EC2 Run Command in action.

If you’re still stuck dealing with servers and haven’t yet upgraded to an IaaS-agnostic cloud-native platform, then you’re looking for ways to create a consistent experience. Surveys keep showing that teams are flocking to GUI-light, automation-centric software for configuration management (e.g. Chef, Ansible), resource provisioning (e.g. Terraform, AWS CloudFormation, Azure Resource Manager), and software deployment. As companies do “hybrid computing” and mix and match servers from different providers, they really need to figure out a way to establish some consistent practices for building and managing many servers. Is the answer to use the cloud provider’s native GUI or a GUI-centric “multi-cloud manager” tool? I don’t think so.

Multi-cloud vendors are trying to put a useful layer of abstraction on top of non-commodity IaaS, but you end up with what AWS CEO Andy Jassy calls the “lowest common denominator.” Multi-cloud vendors struggle to keep up with the blistering release pace of public cloud vendors they support, and often neutralize the value of a given cloud by trying to create a common experience. No, the answer seems to be to use these GUIs for simple scenarios only, and rely primarily on APIs and automation that you can control.

But SOMEONE is using these (multi)cloud GUIs! They must offer some value. So who is the real audience for the cloud provider portals, or multi-cloud products now offered by Cisco (Cliqr), IBM (Gravitant), and CenturyLink (ElasticBox)?
- Business users. One clear area of value in cloud GUIs is for managers who want to dip in and see what’s been deployed, and finance personnel who are doing cost modeling and billing. The native portals offered by cloud providers are getting better at this, but it’s also been an area where multi-cloud brokers have invested heavily. I don’t want to ask the dev manager to write an app that pulls the AWS billing history. That seems … abusive. Use the GUI.
- Infrequent tech users with simple tasks. Look, I only log into the AWS portal every month or so. It wouldn’t make a ton of sense for me to build out a whole provisioning and management pipeline to build a server every so often. Even dropping down to the CLI isn’t more productive in those cases (for me). Other people at your company may be frequent, power users and it makes sense for them to automate the heck out of their cloud. In my case, the GUI is (mostly) fine. Many of the cloud provider portals reflect this reality. Look at the Azure Portal. It is geared towards executing individual actions with a lot of visual flair. It is not a productivity interface, or something supportive of bulk activities. Same with most multi-cloud tools I’ve seen. Go build a server, perform an action or two. In those cases, rock on. Use the GUI.
- Companies with only a few slow-changing servers. If you have 10-50 servers in the cloud, and you don’t turn them over very often, then it can make sense to use the native cloud GUI for a majority of your management. A multi-cloud broker would be overkill. Don’t prematurely optimize.
I think AWS nailed its target use case with EC2 Run Command. When it first launched in October of 2015, it was for AWS Windows servers. Amazon now supports Windows and Linux, and servers inside or outside of AWS data centers. Run ad-hoc PowerShell or Linux scripts, install software, update the OS, you name it. Kick it off with the AWS Console, API, SDK, CLI or via PowerShell extensions. And because it’s agent based and pull-driven, AWS doesn’t have to know a thing about the cloud the server is hosted in. It’s a straightforward, configurable, automation-centric, and free way to do basic cross-cloud management.

How’s it work? First, I created an EC2 “activation” which is used to generate a code to register the “managed instances.” When creating it, I also set up a security role in Identity and Access Management (IAM) which allows me to assign rights to people to issue commands.

Out of the activation, I received a code and ID that’s used to register a new server. With the activation in place, I built a pair of Windows servers in Microsoft Azure and CenturyLink Cloud. I logged into each server, and installed the AWS Tools for Windows PowerShell. Then, I pasted a simple series of commands into the Windows PowerShell for AWS window:
```
$dir = $env:TEMP + "\ssm"

New-Item -ItemType directory -Path $dir

cd $dir

(New-Object System.Net.WebClient).DownloadFile("https://amazon-ssm-us-east-1.s3.amazonaws.com/latest/windows_amd64/AmazonSSMAgentSetup.exe", $dir + "\AmazonSSMAgentSetup.exe")

Start-Process .\AmazonSSMAgentSetup.exe -ArgumentList @("/q", "/log", "install.log", "CODE=<my code>", "ID=<my id>", "REGION=us-east-1") -Wait

Get-Content ($env:ProgramData + "\Amazon\SSM\InstanceData\registration")

Get-Service -Name "AmazonSSMAgent"
```
The commands simply download the agent software, installs it as a Windows Service, and registers the box with AWS. Immediately after installing the agent on servers in other clouds, I saw them listed in the Amazon Console. Sweet.

Now the fun stuff. I can execute commands from existing Run Command documents (e.g. “install missing Windows updates”), run ad-hoc commands, find public documents written by others, or create my own documents.

For instance, I could do a silly-simple “ipconfig” ad-hoc request against my two servers …

… and I almost immediately received the resulting output. If I expected a ton of output from the command, I could log it all to S3 object storage.

As I pick documents to execute, the parameters change. In this case, choosing the “install application” document means that I provide a binary source and some parameters:

I’ve shown off the UI here (ironically, I guess), but the real value is that I could easily create documents or execute commands from the AWS CLI or something like the Node SDK. What a great way to do hybrid, ad-hoc management! It’s not a complete solution and doesn’t replace config management or multi-cloud provisioning tools, but it’s a pretty handy way to manage a fleet of distributed servers.

There’s definitely a place for GUIs when working with infrastructure clouds, but they really aren’t meant for power users. If you’re forcing your day-to-day operations/service team to work through a GUI-centric tool, you’re telling them that you don’t value their time. Rather, make sure that any vendor-provided software your operations team gets their hands on has an API. If not, don’t use it.

What do you think? Other scenarios with using the GUI makes the most sense?
July 13, 2016
Speaking at INTEGRATE 2016 Next Month

Do you live in Europe and have affection for application integration? If so, then I hope to see you at INTEGRATE 2016. This annual conference has become a must-attend event for those interested in connecting apps together, and I’m thrilled that Saravana Kumar asked me to present this year. I’m actually more excited about attending the conference. The organizers have pulled together an enviable collection of speakers, and the attendees are some of the smartest integration people I’ve met.

While this is a Microsoft-heavy event, I decided to bring at outside perspective to the conference this year. Instead of presenting something about Microsoft’s broad integration portfolio, I thought it’d be fun to share the latest happenings in the open source space. I’ll be talking about relevant integration patterns for modern apps and digging into a handful of popular distributed messaging platforms. This should give attendees an up-to-date perspective of the market, and the opportunity to compare open source platforms to Microsoft’s offerings.

I hope to see you there!

April 19, 2016
Where to host your integration bus
RightScale recently announced the results of their annual “State of the Cloud” survey. You can find the report here, and my InfoQ.com story here. A lot of people participated in the survey, and the results showed that a majority of companies are growing their public cloud usage, but continuing to invest heavily in on-premises “cloud” environments. When I was reading this report, I was thinking about the implications on a company’s application integration strategy. As workloads continue to move to cloudy hosts and companies start to get addicted to the benefits of cloud (from the survey: “faster access to infrastructure”, “greater scalability”, “geographic reach”, “higher performance”), does that change what they think about running integration services? What are the options for a company wondering where to host their application/data integration engine, and what benefits and risks are associated with each choice?

The options below should apply whether you’re doing real-time or batch integration, high throughput messaging or complex orchestration, synchronous or asynchronous communication.

Option #1 – Use an Integration-as-a-Service engine in the public cloud

It may make sense to use public cloud integration services to connect your apps. Or, introduce these as edge intake services that still funnel data to another bus further downstream.

Benefits
- Easy to scale up or down. One of the biggest perks of a cloud-based service is that you don’t have to do significant capacity planning up front. For messaging services like Amazon SQS or the Azure Service Bus, there’s very little you have to consider. For an integration service like SnapLogic, there are limits, but you can size up and down as needed. The key is that you can respond to bursts (or troughs) in usage by cutting your costs. No more over-provisioning just in case you might need it.
- Multiple patterns available. You won’t see a glut of traditional ESB-like cloud integration services. Instead, you’ll find many high-throughput messaging (e.g. Google Pub/Sub) or stream processing services (e.g. Azure Stream Analytics) that take advantage of the elasticity of the cloud. However, if you’re doing bulk data movement, there are multiple viable services available (e.g. Talend Integration Cloud), if you’re doing stateful integration there are also services for that (e.g. Azure Logic Apps).
- No upgrade projects. From my experience, IT never likes funding projects that upgrade foundational infrastructure. That’s why you have servers still running Windows Server 2003, or Oracle databases that are 14 versions behind. You always tell yourself that “NEXT year we’ll get that done!” One of the seductive aspects of cloud-based services is that you don’t deal with that any longer. There are no upgrades; new capabilities just show up. And for all these cloud integration services, that means always getting the latest and greatest as soon as it’s available.
- Regular access to new innovations. Is there anything in tech more depressing than seeing all these flashy new features in a product that you use, and knowing that you are YEARS away from deploying it? Blech. The industry is changing so fast, that waiting 4 years for a refresh cycle is an eternity. If you’re using a cloud integration service, then you’re able to get new endpoint adapters, query semantics, storage enhancements and the like as soon as possible.
- Connectivity to cloud hosted systems, partners. One of the key reasons you’d choose a cloud-based integration service is so that you’re closer to your cloudy workloads. Running your web log ingest process, partner supply chain, or master-data management jobs all right next to your cloud-hosted databases and web apps gives you better performance and simpler connectivity. Instead of navigating the 12 layers of firewall hell to expose your on-premises integration service to Internet endpoints, you’re right next door.
- Distributed intake and consumption. Event and data sources are all over the place. Instead of trying to ship all that information to a centralized bus somewhere, it can make sense to do some intake at the edge. Cloud-based services let you spin up multiple endpoints in various geographies with ease, which may give you much more flexibility when taking in Internet-of-Things beacon data, orders from partners, or returning data from time-sensitive request/reply calls.
- Lower operational cost. You MAY end up paying less, but of course you could also end up paying more. Depends on your throughput, storage, etc. But ideally, if you’re using a cloud integration service, you’re not paying the same type of software licensing and hardware costs as you would for an on-premises system.
Risks
- High latency with on-premises systems. Unless your company was formed within the last 18 months, I’d be surprised if you didn’t have SOME key systems sitting in a local facility. While latency may not matter for some asynchronous workloads, if you’re taking in telemetry data from devices and making real-time adjustments to applications, every millisecond counts. Depending on where your home office is, there could be a bit of distance between your cloud-based integration engine and the key systems it talks to.
- Limited connectivity to on-premises systems (bi-directional). It’s usually not too challenging to get on-premises systems to reach out to the Internet (and push data to an endpoint), but it’s another matter to allow data to come *into* your on-premises systems from the Internet. Some integration services have solved this by putting agents on the local environment to facilitate secure communication, but realistically, it’ll be on you to extract data from cloud-based engines versus expecting them to push data into your data centers.
- Experience data leakage if data security isn’t properly factored in. If the data never leaves your private network, it can be easy to be lazy about security. Encrypt in transit? Ok. Encrypt the data as well? Nah. If that casual approach to security isn’t tightened up when you start passing data through cloud integration services, you could find yourself in trouble. While your data may be protected from others accidentally seeing it, you may have made it easy for others within your own organization to extract or tap into data they didn’t have access to before.
- Services are not as mature as software-based products, and focused mostly on messaging. It’s true that cloud-based solutions haven’t been around as long as the Tibcos, BizTalk Servers, and such. And, many cloud-based solutions focus less on traditional integration techniques (FTP! CSV files!) and more on Internet-scale data distribution.
- Opaque operational interfaces make troubleshooting more difficult. We’re talking about as-a-Service products here, so by definition, you’re not running this yourself. That means you can’t check out the server logs, add tracing logic, or view the memory consumption of a particular service. Instead, you only have the interfaces exposed by the vendor. If troubleshooting data is limited, you have no other recourse.
- Limited portability of the configuration between providers. Depending on the service you choose, there’s a level of lock-in that you have to accept. Your integration logic from one service can’t be imported into another. Frankly, the same goes for on-premises integration engines. Either way, your application/data integration platform is probably a key lock-in point regardless of where you host it.
- Unpredictable availability and uptime. A key value proposition of cloud is high availability, but you have to take the provider’s word for it that they’ve architected as such. If your cloud integration bus is offline, so are you. There’s no one to yell at to get it back up and running. Likewise, any maintenance to the platforms happens at a time that works for the vendor, not for you. Ideally you never see downtime, but you absolutely have less control over it.
- Unpredictable pricing on cost dimensions you may not have tracked before (throughput, storage). I’d doubt that most IT shops know their true cost of operations, but nonetheless, it’s possible to get sticker shock when you start paying based on consumption. Once you’ve sunk cost into an on-premises service, you may not care about message throughput or how much data you’re storing. You will care about things like that when using a pay-as-you-go cloud service.
Option #2 – Run your integration engine in a public cloud environment

If adopting an entirely managed public service isn’t for you, then you still may want the elastic foundation of cloud while running your preferred integration engine.

Benefits
- Run the engine of your choice. Like using Mule, BizTalk Server, or Apache Kafka and don’t want to give it up? Take that software and run it on public cloud Infrastructure-as-a-Service. No need to give up your preferred engine just because you want a more flexible host.
- Configuration is portable from on-premises solution (if migrating versus setting this up brand new). If you’re “upgrading” from fixed virtual machines or bare metal boxes to an elastic cloud, the software stays the same. In many cases, you don’t have to rewrite much (besides some endpoint addresses) in order to slide into an environment where you can resize the infrastructure up and down much easier.
- Scale up and down compute and storage. Probably the number one reason to move. Stop worrying about boxes that are too small (or large!) and running out of disk space. By moving from fixed on-premises environments to self-service cloud infrastructure, you can set an initial sizing and continue to right-size on a regular basis. About to beat the hell out of your RabbitMQ environment for a few days? Max out the capacity so that you can handle the load. Elasticity is possibly the most important reason to adopt cloud.
- Stay close to cloud hosted systems. Your systems are probably becoming more distributed, not more centralized. If you’re seeing a clear trend towards moving to cloud applications, then it may make sense to relocate your integration bus to be closer to them. And if you’re worried about latency, you could choose to run smaller edge instances of your integration bus that feed data to a centralized one. You have much more flexibility to introduce such an architecture when capacity is available anywhere, on-demand.
- Keep existing tools and skillsets around that engine. One challenge that you may have when adopting an integration-as-a-service product is the switching costs. Not only are you rebuilding your integration scenarios in a new product, but you’re also training up staff on an entirely new toolset. If you keep your preferred engine but move it to the public cloud, there are no new training costs.
- Low level troubleshooting available. If problems pop up – and of course they will – you have access to all the local logs, services, and configurations that you did before. Integration solutions are notoriously tricky to debug given the myriad locations where something could have gone amiss. The more data, the better.
- Experience easier integration scenarios with partners. You may love using BizTalk’s Trading Partner Management capabilities, but don’t like wrangling with network and security engineers to expose the right endpoints from your on-premises environment. If you’re running the same technology in the public cloud, you’ll have a simpler time securely exposing select endpoints and ports to key partners.
Risks
- Long distance from integrated systems. Like the risk in the section above, there’s concern that shifting your integration engine to the public cloud will mean taking it away from where all the apps are. Does the enhanced elasticity make up for the fact that your business data now has to leave on-premises systems and travel to a bus sitting miles away?
- Connectivity to on-premises systems. If your cloud virtual machines can’t reach your on-premises systems, you’re going to have some awkward integration scenarios. This is where Infrastructure-as-a-Service can be a little more flexible than cloud integration services because it’s fairly easy to set up a persistent, secure tunnel between cloud IaaS networks and on-premises networks. Not so easy to do with cloud messaging services.
- There’s a larger attack surface if engine has public IP connectivity. You may LIKE that your on-premises integration bus is hard to reach! Would-be attackers must breach multiple zones in order to attack this central nervous system of your company. By moving your integration engine to the cloud and opening up ports for inbound access, you’re creating a tempting target for those wishing to tap into this information-rich environment.
- Not getting any of the operation benefits that as-a-service products possess. One of the major downsides of this option is that you haven’t actually simplified much; you’re just hosting your software elsewhere. Instead of eliminating infrastructure headaches and focusing on connecting your systems, you’re still standing up (virtual) infrastructure, configuring networks, installing software, managing software updates, building highly available setups, and so on. You may be more elastic, but you haven’t reduced your operational burden.
- Few built in connectivity to cloudy endpoints. If you’re using an integration service that comes with pre-built endpoint adapters, you may find that traditional software providers aren’t keeping up with “cloud born” providers. SnapLogic will always have more cloud connectivity than BizTalk Server, for example. You may not care about this if you’re dealing with messaging engines that require you to write producer/consumer code. But for those that like having pre-built connectors to systems (e.g. IFTTT), you may be disappointed with your existing software provider.
- Availability and uptime, especially if the integration engine isn’t cloud-native. If you move your integration engine to cloud IaaS, it’s completely on you to ensure that you’ve got a highly available setup. Running ZeroMQ on a single cloud virtual machine isn’t going to magically provide a resilient back end. If you’re taking a traditional ESB product and running it in cloud VMs, you still likely can’t scale out as well as cloud-friendly distributed engines like Kafka or NATS.
Option #3 – Run your integration engine on-premises

Running an integration engine in the cloud may not be for you. Even if your applications are slowly (quickly?) moving to the cloud, you might want to keep your integration bus put.

Benefits
- Run the engine of your choice. No one can tell you what to do in your own house! Pick the ESB, messaging engine, or ETL tool that works for you.
- Control the change and maintenance lifecycle. This applies to option #2 to some extent, but when you control the software to the metal, you can schedule maintenance at optimal times and upgrade the software on your own timetable. If you’ve got a sensitive Big Data pipeline and want to reboot Spark ONLY when things are quiet, then you can do that.
- Close to all on-premises systems. Plenty of workloads are moving to public cloud, but it’s sure as heck not all of them. Or at least right now. You may be seeing commodity services like CRM or HR quickly going to cloud services, but lots of mission critical apps still sit within your data centers. Depending on what your data sources are, you may have a few years before you’re motivated to give your integration engine a new address.
- You can still reach out to Internet endpoints, while keeping inbound ports closed. If you’re running something like BizTalk Server, you can send data to cloud endpoints, and even receive data in (through the Service Bus) without exposing the service to the Internet. And if you’re using messaging engines where you write the endpoints, it may not really matter if the engine is on-site.
- Can get some elasticity through private clouds. Don’t forget about private clouds! While some may think private clouds are dumb (because they don’t achieve the operational benefits or elasticity of a public cloud), the reality is that many companies have doubled down on them. If you take your preferred integration engine and slide it over to your private cloud, you may get some of the elasticity and self-service benefits that public cloud customers get.
Risks
- Difficult to keep up to date with latest versions. As the pace of innovation and disruption picks up, you may find it hard to keep your backbone infrastructure up to date. By continuing to own the lifecycle of your integration software, you run the risk of falling behind. That may not matter if you like the version of the software that you are on – or if you have gotten great at building out new instances of your engines and swapping consumers over to them – but it’s still something that can cause problems.
- Subject to capacity limitations and slow scale up/out. Private clouds rarely have the same amount of hardware capacity that public clouds do. So even if you love dropping RabbitMQ into your private cloud, there may not be the storage or compute available when you need to quickly expand.
- Few native connectors to cloudy endpoints. Sticking with traditional software may mean that you stay stuck on a legacy foundation instead of adopting a technology that’s more suited to connecting cloud endpoints or high-throughput producers.
There’s no right or wrong answer here. Each company will have different reasons to choose an option above (or one that I didn’t even come up with!). If you’re interested in learning more about the latest advances in the messaging space, join me at the Integrate 2016 event (pre-registration here) in London on May 12-13. I’ll be doing a presentation on what’s new in the open source messaging space, and how increasingly popular integration patterns have changed our expectations of what an integration engine should be able to do.
March 8, 2016
What Are All of Microsoft Azure’s Application Integration Services?
As a Microsoft MVP for Integration – or however I’m categorized now – I keep a keen interest in where Microsoft is going with (app) integration technologies. Admittedly, I’ve had trouble keeping up with all the various changes, and thought it’d be useful to take a tour through the status of the Microsoft integration services. For each one, I’ll review its use case, recent updates, and how to consume it.

What qualifies as an integration technology nowadays? For me, it’s anything that lets me connect services in a distributed system. That “system” may be comprised of components running entirely on-premises, between business partners, or across cloud environments. Microsoft doesn’t totally agree with that definition, if their website information architecture is any guide. They spread around the services in categories like “Hybrid Integration”, “Web and Mobile”, “Internet of Things”, and even “Analytics.”

But, whatever. I’m considering the following Microsoft technologies as part of their cloud-enabled integration stack:
- Service Bus
- Event Hubs
- Data Factory
- Stream Analytics
- BizTalk Services
- Logic Apps
- BizTalk Server on Cloud Virtual Machines
I considered, but skipped, Notification Hubs, API Apps, and API Management. They all empower application integration scenarios in some fashion, but it’s more ancillary. If you disagree, tell me in the comments!

Service Bus

What is it?

The Service Bus is a general purpose messaging engine released by Microsoft back in 2008. It’s made up of two key sets of services: the Service Bus Relay, and Service Bus Brokered Messaging.
https://twitter.com/clemensv/status/648902203927855105
The Service Bus Relay is a unique service that makes it possible to securely expose on-premises services to the Internet through a cloud-based relay. The service supports a variety of messaging patterns including request/reply, one-way asynchronous, and peer-to-peer.

But what if the service client and server aren’t online at the same time? Service Bus Brokered Messaging offers a pair of asynchronous store-and-forward services. Queues provide first-in-first-out delivery to a single consumer. Data is stored in the queue until retrieved by the consumer. Topics are slightly different. They make it possible for multiple recipients to get a message from a producer. It offers a publish/subscribe engine with per-recipient filters.

How does the Service Bus enable application integration? The Relay lets companies expose legacy apps through public-facing services, and makes cross-organization integration much simpler than setting up a web of VPN connections and FTP data exchanges. Brokered Messaging makes it possible to connect distributed apps in a loosely coupled fashion, regardless of where those apps reside.

What’s new?

This is a fairly mature service with a slow rate of change. The only thing added to the Service Bus in 2015 is Premium messaging. This feature gives customers the choice to run the Brokered Messaging components in a single-tenant environment. This gives users more predictable performance and pricing.

From the sounds of it, Microsoft is also looking at finally freeing Service Bus Relay from the shackles of WCF. Here’s hoping.
https://twitter.com/clemensv/status/639714878215835648
How to use it?

Developers work with the Service Bus primarily by writing code. To host a Relay service, you must write a WCF service that uses one of the pre-defined Service Bus bindings. To make it easy, developers can add the Service Bus package to their projects via NuGet.

The only aspect that requires .NET is hosting Relay services. Developers can consume Relay-bound services, Queues, and Topic subscriptions from a host of other platforms, or even just raw REST APIs. The Microsoft SDKs for Java, PHP, Ruby, Python and Node.js all include the necessary libraries for talking to the Service Bus. AMQP support appears to be in a subset of the SDKs.

It’s also possible to set up Service Bus Queues and Topics via the Azure Portal. From here, I can create new Queues, add Topics, and configure basic Subscriptions. I can also see any active Relay service endpoints.

Finally, you can interact with the Service Bus through the super powerful (and open source) Service Bus Explorer created by Micosoftie Paolo Salvatori. From here, you can configure and test virtually every aspect of the Service Bus.

Event Hubs

What is it?

Azure Event Hubs is a scalable service for high-volume event intake. Stream in millions of events per second from applications or devices. It’s not an end-to-end messaging engine, but rather, focuses heavily on being a low latency “front door” that can reliably handle consistent or bursty event streams.

Event Hubs works by putting an ordered sequence of events into something called a partition. Like Apache Kafka, an Event Hub partition acts like an append-only commit log. Senders – who communicate with Event Hubs via AMQP and HTTP – can specify a partition key when submitting events, or leave it out so that a round-robin approach decides which partition the event goes to. Partitions are accessed by readers through Consumer Groups. A consumer group is like a view of the event stream. There should only be a single partition reader at one time, and Event Hub users definitely have some responsibility for managing connections, tracking checkpoints, and the like.

How do Event Hubs enable application integration? A core use case of Event Hubs is capturing high volume “data exhaust” thrown off by apps and devices. You may also use this to aggregate data from multiple sources, and have a consumer process pull data for further processing and sending to downstream systems.

What’s new?

In July of 2015, Microsoft added support for AMQP over web sockets. They also added the service to an addition region in the United States.

How to use it?

It looks like only the .NET SDK has native libraries for Event Hubs, but developers can still use either the REST API or AMQP libraries in their language of choice (e.g. Java).

The Azure Portal lets you create Event Hubs, and do some basic configuration.

Paolo also added support for Event Hubs in the Service Bus Explorer.

Data Factory

What is it?

The Azure Data Factory is a cloud-based data integration service that does traditional extract-transform-load but with some modern twists. Data Factory can pull data from either on-premises or cloud endpoints. There’s an agent-based “data management gateway” for extracting data from on-premises file systems, SQL Servers, Oracle databases, Teradata databases, and more. Data transformation happens in Hadoop cluster or batch processing environment. All the various processing activities are collected into a pipeline that gets executed. Activities can have policies attached. A policy controls concurrency, retries, delay duration, and more.

How does the Data Factory enable application integration? This could play a useful part in synchronizing data used by distributed systems. It’s designed for large data sets and is much more efficient than using messaging-based services to ship chunks of data between repositories.

What’s new?

This service just hit “general availability” in August, so the whole service is kinda new.

How to use it?

You have a few choices for interacting with the Data Factory service. As mentioned earlier, there are a whole bunch of supported database and file endpoints, but what about creating and managing the factories themselves? Your choices are Visual Studio, PowerShell, REST API, or the graphical designer in the Azure Preview Portal. Developers can download a package to add the appropriate project types to Visual Studio, and download the latest Azure PowerShell executable to get Data Factory extensions.

To do any visual design, you need to jump into the (still) Preview Portal. Here you can create, manage, and monitor individual factories.

Stream Analytics

What is it?

Stream Analytics is a cloud-hosted event processing engine. Point it at an event source (real-time or historical) and run data over queries written in a SQL-like language. An event source could be a stream (like Event Hubs), or reference data (like Azure Blob storage). Queries can join streams, convert data types, match patterns, count unique values, look for changed values, find specific events in windows, detect absence of events, and more.

Once the data has been processed, it goes to one of many possible destinations. These include Azure SQL Databases, Blob storage, Event Hubs, Service Bus Queues, Service Bus Topics, Power BI, or Azure Table Storage. The choice of consumer obviously depends on what you want to do with the data. If the stream results should go back through a streaming process, then Event Hubs is a good destination. If you want to stash the resulting data in a warehouse for later BI, go that way.

How does Stream Analytics enable application integration? The output of a stream can go into the Service Bus for routing to other systems or triggering actions in applications hosted anywhere. One system could pump events through Stream Analytics to detect relevant business conditions and then send the output events to those systems via database or messaging.

What’s new?

This service became generally available in April 2015. In July, Microsoft added support for Service Bus Queues and Topics as output types. A few weeks ago, there was another update that added IoT-friendly capabilities like DocumentDB as an output, support for the IoT Hub service, and more.

How to use it?

Lots of different app services can connect to Stream Analytics (including Event Hubs, Power BI, Azure SQL Databases), but it looks like you’ve got limited choices today in setting up the stream processing jobs themselves. There’s the REST API, .NET SDK, or classic Portal UI.

The Portal UI lets you create jobs, configure inputs, write and test queries, configure outputs, scale streaming units up and down, and change job settings.

BizTalk Services

What is it?

BizTalk Services targets EAI (enterprise application integration) and EDI (electronic data exchange) scenarios by offering tools and connectors that are designed to bridge protocol and data mismatches between systems. Developers send in XML or flat file data, and it can be validated, transformed, and routed via a “bridge” component. Message validation is done against an XSD schema, and XSLT transformations are created visually in a sophisticated mapping tool. Data comes into BizTalk Services via HTTP, S/FTP, Service Bus Queues, or Service Bus Topic subscription. Valid destinations for the output of a bridge include FTP, Azure Blob storage, one-way Service Bus Relay endpoints, Service Bus Queues, and more. Microsoft also added a BizTalk Adapter Service that lets you expose on-premises endpoints – options include SQL Server, Oracle databases, Oracle E-Business Suite, SAP, and Siebel – to cloud-hosted bridges.

BizTalk Services also has some business-to-business capabilities. This includes support for a wide range of EDI and EDIFACT schemas, and a basic trading partner management portal.

How does BizTalk Services enable application integration? BizTalk Services makes it possible for applications with different endpoints and data structures to play nice. Obviously this has potential to be a useful part of an application integration portfolio. Companies need to connect their assets, which are more distributed now more than ever.

What’s new?

BizTalk Services itself seems a bit stagnant (judging by the sparse release notes), but some of its services are now exposed in Logic and API apps (see below). Not sure where this particular service is heading, but its individual pieces will remain useful to teams that want access to transformation and connectivity services in their apps.

How to use it?

Working with BizTalk Services means using the REST API, PowerShell commandlets, or Visual Studio. There’s also a standalone management portal that hangs off the main Azure Portal.

In Visual Studio, developers need to add the BizTalk Services SDK in order to get the necessary components. After installing, it’s easy enough to model out a bridge with all the necessary inputs, outputs, schemas, and maps.

In the standalone portal, you can search for and delete bridges, upload schemas and maps, add certificates, and track messages. Back in the standard Azure Portal, you configure things like backup policies and scaling settings.

Logic Apps

What is it?

Logic Apps let developers build and host workflows in the cloud. These visually-designed processes run a series of steps (called “actions”) and use “connectors” to access remote data and business logic. There are tons of connectors available so far, and it’s possible to create your own. Core connectors include Azure Service Bus, Salesforce Chatter, Box, HTTP, SharePoint, Slack, Twilio, and more. Enterprise connectors include an AS2 connector, BizTalk Transform Service, BizTalk Rules Service, DB2 connector, IBM WebSphere MQ Server, POP3, SAP, and much more.

Triggers make a Logic App run, and developers can trigger manually, or off the action of a connector. You could start a Logic app with an HTTP call, a recurring schedule, or upon detection of a relevant Tweet in Twitter. Within the Logic App, you can specify repeating operations and some basic conditional logic. Developers can see and edit the underlying JSON that describes a Logic App. Features like shared parameters are ONLY available to those writing code (versus visually designing the workflow). The various BizTalk-branded API actions offer the ability to validate, transform, and encode data, or execute independently-maintained business rules.

How do Logic Apps enable application integration? This service helps developers put together cloud-oriented application integration workflows that don’t need to run in an on-premises message bus. The various social and SaaS connectors help teams connect to more modern endpoints, while the on-premises connectors and classic BizTalk functionality addresses more enterprise-like use cases.

What’s new?

This is clearly an area of attention for Microsoft. Lots of updates since the server launched in March. Microsoft has added Visual Studio support for designing Logic Apps, future execution scheduling, connector search in the Preview Portal, do … until looping, improvements to triggers, and more.

How to use it?

This is still a preview service, so it’s not surprising that you only have a few ways to interact with it. There’s a REST API for management, and the user experience in the Azure Preview Portal.

Within the Preview Portal, developers can create and manage their Logic Apps. You can either start from scratch, or use one of the pre-built templates that reflect common patterns like content-based routing, scatter-gather, HTTP request/response, and more.

If you want to build your own, you choose from any existing or custom-built API apps.

You then save your Logic App and can have it run either manually or based on a trigger.

BizTalk Server (on Cloud Virtual Machines)

What is it?

Azure users can provision and manage their own BizTalk Server integration server in Azure Virtual Machines using prebuilt images. BizTalk Server is the mature, feature-rich integration bus used to connect enterprise apps. With it, customers get a stateful workflow engine, reliable pub/sub messaging engine, adapter framework, rules engine, trading partner management platform, and full design experience in Visual Studio. While not particularly cloud integrated (with the exception of a couple Service Bus adapters), it can be reasonably used to connect to integrate apps across environments.

What’s new?

BizTalk Server 2013 R2 was released in the middle of last year and included some incremental improvements like native JSON support, and updates to some built-in adapters. The next major release of the platform is expected in 2016, but without a publicly announced feature set.

How to use it?

Deploy the image from the template library if you want to run it in Azure, or do the same in any other infrastructure cloud.

Summary

Whew. Those are a lot of options. Definitely some overlap, but Microsoft also seems to be focused on building these in a microservices fashion. Specifically, single purpose services that do one thing really well and don’t encroach into unnatural territory. For example, Stream Analytics does one thing, and relies on other services to handle other parts of the processing pipeline. I like this trend, as it gets away from a heavyweight monolithic integration service that has a bunch of things I don’t need, but have to deploy. It’s much cleaner (although potentially MORE complex) to assemble services as needed!
October 21, 2015
You don’t need a private cloud, you need isolation options (and maybe more control!)

Private cloud is definitely still a “thing.” Survey after survey shows that companies are running apps in (on-premises) private clouds and cautiously embracing public cloud. But, it often seems that companies see this as a binary choice: wild-west public cloud, or fully dedicated private cloud. I just wrote up a report on Heroku Private Spaces, and this reinforces my belief that the future of IT is about offering increasingly sophisticated public cloud isolation options, NOT running infrastructure on-premises.

Why do companies choose to run things in a multi-tenant public cloud like Azure, AWS, Heroku, or CenturyLink? Because they want to offload responsibility for things that aren’t their core competencies, want elasticity to consume apps infrastructure on their timelines and in any geography, they like the constant access to new features and functionality, and it gives their development teams more tools to get revenue-generating products to market quickly.

How come everything doesn’t run in public clouds? Legit concerns exist about supportability for existing topologies and lack of controls that are “required” by audits. I put required in quotes because in many cases, the spirit of the control can be accomplished, even if the company-defined policies and procedures aren’t a perfect match. For many companies, the solution to these real or perceived concerns is often a private cloud.

However, “private cloud” is often a misnomer. It’s at best a hyper-converged stack that provides an on-demand infrastructure service, but more often it’s a virtualization environment with some elementary self-service capabilities, no charge-back options, no PaaS-like runtimes, and single-location deployments. When companies say they want private clouds, what they OFTEN need is a range of isolation options. By isolation, I mean fewer and fewer dependencies on shared infrastructure. Why isolation? There’s a need to survive an audit that includes detailed network traffic reports, user access logs, and proof of limited access by service provider staff. Or, you have an application topology that doesn’t fit in the “vanilla” public cloud setup. Think complex networking routes or IP spaces, or even application performance requirements.

To be sure, any public cloud today is already delivering isolation. Either your app (in the case of PaaS), or virtual infrastructure (in the case of IaaS) is walled off from other customers, even if they share a control plane. What is the isolation spectrum, and what’s in-between vanilla public cloud and on-premises hardware? I’ve made up a term (“Cloud Isolation Index”) and describe it below.

Customer Isolation

What is it?

This is the default isolation that comes with public clouds today. Each customer has their own carved-out place in a multi-tenant environment. Customers typically share a control plane, underlying physical infrastructure, and in some cases, even the virtual infrastructure. Virtual infrastructure may be shared when you’re considering application services like database-as-a-service, messaging services, identity services, and more.

How is it accomplished?

This is often accomplished through a mix of hardware and software. The base hardware being used by a cloud provider may offer some inherent multi-tenancy, but most likely, the provider is relying on a software tier that isolates tenants. It’s often the software layer that orchestrates an isolated sandbox across physical compute, networking, storage, and customer metadata.

What are the benefits and downsides?

There are lots of reasons that this default isolation level is attractive. Getting started in these environments takes seconds. You have base assurances that you’re not co-mingling your business critical information in a risky way. It’s easier to manage your account or get support because there’s nothing funky going on.

Downsides? You may not be able to satisfy all your audit and complexity concerns because your vanilla isolation doesn’t support customizations that could break other tenants. Public cloud also limits you to the locations that it’s running, so if you need a geography that’s not available from that provider, you’re out of luck.

Service Isolation

What is it?

Take an service and wall it off from other users within a customer account. You may share a control plane, account management, and underlying physical infrastructure. You’re seeing a new crop of solutions here, and I like this trend. Heroku Private Spaces gives you apps and data in a network isolated area of your account, Microsoft Azure Service Bus Premium Messaging delivers resource isolation for your messaging workloads. “Reserved instances” in cloud infrastructure environments serve a similar role. It’s about taking services or set of services and isolating them for security or performance reasons.

How is it accomplished?

It looks like Heroku Private Spaces works by using AWS VPC (see “environment isolation” below) and creating a private network for one or many apps targeted at a Space. Azure likely uses dedicated compute instances to run a messaging unit just for you. Dedicated or reserved services depend on network and (occasionally) compute isolation.

What are the benefits and downsides?

The benefits are clear. Instead of doing a coarse exercise (e.g. setting up dedicated private “cloud” infrastructure somewhere) because one component requires elevated isolation, carve up that app or set of services into a private area. By sharing a control plane with the “public” cloud components, you don’t increase your operational burden.

Environment Isolation (Native)

What is it?

Use vendor-provided cloud features to carve up isolation domains within your customer account. Instead of “Customer Isolation” where everything gets dumped into the vanilla account and everyone has access, here you thoughtfully design an environment and place apps in the right place. Most public clouds offer features to isolation workloads within a given account.

How is it accomplished?

Lots of ways to address this. In the CenturyLink Cloud, we offer things like account hierarchies where customers set up different accounts with unique permissions, network boundaries. Also, our customers use Bare Metal servers for dedicated workloads, role-based access controls to limit permissions, distinct network spaces with carefully crafted firewall policies, and more.

Amazon offer services like Virtual Private Cloud (VPC) that creates a private part of AWS with Internet access. Customers use access groups to control network traffic in and out of a VPC. Many clouds offer granular security permissions so that you can isolate permission and in some cases, access to specific workloads. You’ll also find cloud options for data encryption and other native data security features.

Select private cloud environments also fit into this category. CenturyLink sells a Private Cloud which is fully federated with the public cloud, but on a completely dedicated hardware stack in any of 50+ locations around the world. Here, you have native isolation in a self-service environment, but it still requires a capital outlay.

This is all typically accomplished using features that many clouds provide you out-of-the-box.

What are the benefits and downsides?

One huge benefit is that you can get many aspects of “private cloud” without actually making extensive commitments to dedicated infrastructure. Customers are seeking control and ways to wall-off sensitive workloads. By using inherent features of a global public cloud, you get greater assurances of protection without dramatically increasing your complexity/cost.

Environment Isolation (Manufactured)

What is it?

Sometimes the native capabilities of a public cloud are insufficient for the isolation level that you need. But, one of the great aspects of cloud is the extensibility and in some cases, customization. You’re likely still sharing a control plane and some underlying physical infrastructure.

How is it accomplished?

You can often create an isolated environment through additional software, “hybrid” infrastructure, and even hack-y work-arounds.

Most clouds offer a vast ecosystem of 3rd party open source and commercial appliances. Create isolated networks with an overlay solution, encrypt workloads at the host level, stand up self-managed database solutions, and much more. Look at something like Pivotal Cloud Foundry. Don’t want the built-in isolation provided by a public PaaS provider? Run a dedicated PaaS in your account and create the level of isolation that your apps demand.

You also have choices to weave environments together into a hybrid cloud. If you can’t place something directly in the cloud data center, then you can use things like Azure ExpressRoute or AWS Direct Connect to privately link to assets in remote data centers. Since CenturyLink is the 2nd largest colocation provider in the world, we often see customers put parts of their security stack or entirely different environments into our data center and do a direct connect to their cloud environment. In this way, you manufacture the isolation you need by connecting different components that reside in different isolation domains.

Another area that comes up with regards to isolation is vendor access. It’s one thing to secure workloads to prevent others within your company from accessing them. It’s another to also prevent the service provider themselves from accessing them! You make this happen by using encryption (that you own the keys for), additional network overlays, or even changing the passwords on servers to something that the cloud management platform doesn’t know.

What are the benefits and downsides?

If public cloud vendors *didn’t* offer the option to manufacture your desired isolation level, you’d see a limit to what ended up going there. The benefit of this level is that you can target more sensitive or complex workloads at the public cloud and still have a level of assurance that you’ve got an advanced isolation level.

The downside? You could end up with a very complicated configuration. If your cloud account no longer resembles its original state, you’ll find that your operational costs go up, and it might be more difficult to take advantage of new features being natively added to the cloud.

Total Isolation

What is it?

This is the extreme end of the spectrum. Stand up an on-premises or hosted private cloud that doesn’t share a control plane or any infrastructure with another tenant.

How is it accomplished?

You accomplish this level of isolation by buying stuff. You typically make a significant commit to infrastructure for the privilege of running it yourself, or paying someone else to run it on your behalf. You spend time working with consultants to size and install an environment.

What are the benefits and downsides?

The benefits? You have complete control of an infrastructure environment and can use the hardware vendors you want, and likely create any sort of configuration you need to support your existing topologies. The downside? You’re probably not getting anywhere near the benefit that your competitors are who are using the public cloud to scale faster, and in more places than you’ll ever be with owned infrastructure.

I’m not sure I feel the same way as Cloud Opinion, but the point is well taken.

@mjasay the total number of companies in the world for whom private cloud may make sense is approximately 25.
— so called parody. (@cloud_opinion) September 8, 2015

Summary

Isolation should be a feature, not a capital project.

This isolation concept is still a work in progress for me, and probably needs refinement. Am I missing parts of the spectrum? Have I undersold fully dedicated private infrastructure? It seems that if we talked more about isolation levels, and less about public vs. private, we’d be having smarter conversations. Agree?

September 17, 2015
Comparing Clouds: API Capabilities
API access is quickly becoming the most important aspect of any cloud platform. How easily can you automate activities using programmatic interfaces? What hooks do you have to connect on-premises apps to cloud environments? So far in this long-running blog series, I’ve taken a look at how to provision, scale, and manage the cloud environments of five leading cloud providers. In this post, I’ll explore the virtual-machine-based API offerings of the same providers. Specifically, I’m assessing:
- Login mechanism. How do you access the API? Is it easy for developers to quickly authenticate and start calling operations?
- Request and response shape. Does the API use SOAP or REST? Are payloads XML, JSON, or both? Does a result set provide links to follow to additional resources?
- Breadth of services. How comprehensive is the API? Does it include most of the capabilities of the overall cloud platform?
- SDKs, tools, and documentation. What developer SDKs are available, and is there ample documentation for developers to leverage?
- Unique attributes. What stands out about the API? Does it have any special capabilities or characteristics that make it stand apart?
As an aside, there’s no “standard cloud API.” Each vendor has unique things they offer, and there’s no base interface that everyone conforms to. While that makes it more challenge to port configurations from one provider to the next, it highlights the value of using configuration management tools (and to a lesser extent, SDKs) to provide abstraction over a cloud endpoint.

Let’s get moving, in alphabetical order.

DISCLAIMER: I’m the VP of Product for CenturyLink’s cloud platform. Obviously my perspective is colored by that. However, I’ve taught four well-received courses on AWS, use Microsoft Azure often as part of my Microsoft MVP status, and spend my day studying the cloud market and playing with cloud technology. While I’m not unbiased, I’m also realistic and can recognize strengths and weaknesses of many vendors in the space.

Amazon Web Services

Amazon EC2 is among the original cloud infrastructure providers, and has a mature API.

Login mechanism

For AWS, you don’t really “log in.” Every API request includes an HTTP header made up of the hashed request parameters signed with your private key. This signature is verified by AWS before executing the requested operation.

A valid request to the API endpoint might look like this (notice the Authorization header):
```
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Amz-Date: 20150501T130210Z
Host: ec2.amazonaws.com
Authorization: AWS4-HMAC-SHA256 Credential=KEY/20150501/us-east-1/ec2/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=ced6826de92d2bdeed8f846f0bf508e8559e98e4b0194b84example54174deb456c

[request payload]
```
Request and response shape

Amazon still supports a deprecated SOAP endpoint, but steers everyone to it’s HTTP services. To be clear, it’s not REST; while the API does use GET and POST, it typically throws a command and all the parameters into the URL. For instance, to retrieve a list of instances in your account, you’d issue a request to:
```
https://ec2.amazonaws.com/?Action=DescribeInstances&AUTHPARAMS
```
For cases where lots of parameters are required – for instance, to create a new EC2 instance – all the parameters are signed in the Authorization header and added to the URL.
```
https://ec2.amazonaws.com/?Action=RunInstances
&ImageId=ami-60a54009
&MaxCount=3
&MinCount=1
&KeyName=my-key-pair
&Placement.AvailabilityZone=us-east-1d
&AUTHPARAMS
```
Amazon APIs return XML. Developers get back a basic XML payload such as:
```
<DescribeInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2014-10-01/">
  <requestId>fdcdcab1-ae5c-489e-9c33-4637c5dda355</requestId>
    <reservationSet>
      <item>
        <reservationId>;r-1a2b3c4d</reservationId>
        <ownerId>123456789012</ownerId>
        <groupSet>
          <item>
            <groupId>sg-1a2b3c4d</groupId>
            <groupName>my-security-group</groupName>
          </item>
        </groupSet>
        <instancesSet>
          <item>
            <instanceId>i-1a2b3c4d</instanceId>
            <imageId>ami-1a2b3c4d</imageId>
```
Breadth of services

Each AWS service exposes an impressive array of operations. EC2 is no exception with well over 100. The API spans server provisioning and configuration, as well as network and storage setup.

I’m hard pressed to find anything in the EC2 management UI that isn’t available in the API set.

SDKs, tools, and documentation

AWS is known for its comprehensive documentation that stays up-to-date. The EC2 API documentation includes a list of operations, a basic walkthrough of creating API requests, parameter descriptions, and information about permissions.

SDKs give developers a quicker way to get going with an API, and AWS provides SDKs for Java, .NET, Node.js. PHP, Python and Ruby. Developers can find these SDKs in package management systems like npm (Node.js) and NuGet (.NET).

As you may expect, there are gobs of 3rd party tools that integrate with AWS. Whether it’s configuration management plugins for Chef or Ansible, or build automation tools like Terraform, you can expect to find AWS plugins.

Unique attributes

The AWS API is comprehensive with fine-grained operations. It also has a relatively unique security process (signature hashing) that may steer you towards the SDKs that shield you from the trickiness of correctly signing your request. Also, because EC2 is one of the first AWS services ever released, it’s using an older XML scheme. Newer services like DynamoDB or Kinesis offer a JSON syntax.

Amazon offers push-based notification through CloudWatch + SNS, so developers can get an HTTP push message when things like Autoscale events fire, or a performance alarm gets triggered.

CenturyLink Cloud

Global telecommunications and technology company CenturyLink offers a public cloud in regions around the world. The API has evolved from a SOAP/HTTP model (v1) to a fully RESTful one (v2).

Login mechanism

To use the CenturyLink Cloud API, developers send their platform credentials to a “login” endpoint and get back a reusable bearer token if the credentials are valid. That token is required for any subsequent API calls.

A request for token may look like:
```
POST https://api.ctl.io/v2/authentication/login HTTP/1.1
Host: api.ctl.io
Content-Type: application/json
Content-Length: 54

{
  "username": "[username]",
  "password": "[password]"
}
```
A token (and role list) comes back with the API response, and developers use that token in the “Authorization” HTTP header for each subsequent API call.
```
GET https://api.ctl.io/v2/datacenters/RLS1/WA1 HTTP/1.1
Host: api.ctl.io
Content-Type: application/json
Content-Length: 0
Authorization: Bearer [LONG TOKEN VALUE]
```
Request and response shape

The v2 API uses JSON for the request and response format. The legacy API uses XML or JSON with either SOAP or HTTP (don’t call it REST) endpoints.

To retrieve a single server in the v2 API, the developer sends a request to:
```
GET https://api.ctl.io/v2/servers/{accountAlias}/{serverId}
```
The responding JSON for most any service is verbose, and includes a number of links to related resources. For instance, in the example response payload below, notice that the caller can follow links to the specific alert policies attached to a server, billing estimates, and more.
```
{
  "id": "WA1ALIASWB01",
  "name": "WA1ALIASWB01",
  "description": "My web server",
  "groupId": "2a5c0b9662cf4fc8bf6180f139facdc0",
  "isTemplate": false,
  "locationId": "WA1",
  "osType": "Windows 2008 64-bit",
  "status": "active",
  "details": {
    "ipAddresses": [
      {
        "internal": "10.82.131.44"
      }
    ],
    "alertPolicies": [
      {
        "id": "15836e6219e84ac736d01d4e571bb950",
        "name": "Production Web Servers - RAM",
        "links": [
          {
            "rel": "self",
            "href": "/v2/alertPolicies/alias/15836e6219e84ac736d01d4e571bb950"
          },
          {
            "rel": "alertPolicyMap",
            "href": "/v2/servers/alias/WA1ALIASWB01/alertPolicies/15836e6219e84ac736d01d4e571bb950",
            "verbs": [
              "DELETE"
            ]
          }
        ]
     ],
    "cpu": 2,
    "diskCount": 1,
    "hostName": "WA1ALIASWB01.customdomain.com",
    "inMaintenanceMode": false,
    "memoryMB": 4096,
    "powerState": "started",
    "storageGB": 60,
    "disks":[
      {
        "id":"0:0",
        "sizeGB":60,
        "partitionPaths":[]
      }
    ],
    "partitions":[
      {
        "sizeGB":59.654,
        "path":"C:\\"
      }
    ],
    "snapshots": [
      {
        "name": "2014-05-16.23:45:52",
        "links": [
          {
            "rel": "self",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40"
          },
          {
            "rel": "delete",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40"
          },
          {
            "rel": "restore",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40/restore"
          }
        ]
      }
    ],
},
  "type": "standard",
  "storageType": "standard",
  "changeInfo": {
    "createdDate": "2012-12-17T01:17:17Z",
    "createdBy": "user@domain.com",
    "modifiedDate": "2014-05-16T23:49:25Z",
    "modifiedBy": "user@domain.com"
  },
  "links": [
    {
      "rel": "self",
      "href": "/v2/servers/alias/WA1ALIASWB01",
      "id": "WA1ALIASWB01",
      "verbs": [
        "GET",
        "PATCH",
        "DELETE"
      ]
    },
    …{
      "rel": "group",
      "href": "/v2/groups/alias/2a5c0b9662cf4fc8bf6180f139facdc0",
      "id": "2a5c0b9662cf4fc8bf6180f139facdc0"
    },
    {
      "rel": "account",
      "href": "/v2/accounts/alias",
      "id": "alias"
    },
    {
      "rel": "billing",
      "href": "/v2/billing/alias/estimate-server/WA1ALIASWB01"
    },
    {
      "rel": "statistics",
      "href": "/v2/servers/alias/WA1ALIASWB01/statistics"
    },
    {
      "rel": "scheduledActivities",
      "href": "/v2/servers/alias/WA1ALIASWB01/scheduledActivities"
    },
    {
      "rel": "alertPolicyMappings",
      "href": "/v2/servers/alias/WA1ALIASWB01/alertPolicies",
      "verbs": [
        "POST"
      ]
    },  {
      "rel": "credentials",
      "href": "/v2/servers/alias/WA1ALIASWB01/credentials"
    },

  ]
}
```
Breadth of services

CenturyLink provides APIs for a majority of the capabilities exposed in the management UI. Developers can create and manage servers, networks, firewall policies, load balancer pools, server policies, and more.

SDKs, tools, and documentation

CenturyLink recently launched a Developer Center to collect all the developer content in one place. It points to the Knowledge Base of articles, API documentation, and developer-centric blog. The API documentation is fairly detailed with descriptions of operations, payloads, and sample calls. Users can also watch brief video walkthroughs of major platform capabilities.

There are open source SDKs for Java, .NET, Python, and PHP. CenturyLink also offers an Ansible module, and integrates with multi-cloud manager tool vRealize from VMware.

Unique attributes

The CenturyLink API provides a few unique things. The platform has the concept of “grouping” servers together. Via the API, you can retrieve the servers in a groups, or get the projected cost of a group, among other things. Also, collections of servers can be passed into operations, so a developer can reboot a set of boxes, or run a script against many boxes at once.

Somewhat similar to AWS, CenturyLink offers push-based notifications via webhooks. Developers get a near real-time HTTP notification when servers, users, or accounts are created/changed/deleted, and also when monitoring alarms fire.

DigitalOcean

DigitalOcean heavily targets developers, so you’d expect a strong focus on their API. They have a v1 API (that’s deprecated and will shut down in November 2015), and a v2 API.

Login mechanism

DigitalOcean authenticates users via OAuth. In the management UI, developers create OAuth tokens that can be for read, or read/write. These token values are only shown a single time (for security reasons), so developers must make sure to save it in a secure place.

Once you have this token, you can either send the bearer token in the HTTP header, or, (and it’s not recommended) use it in an HTTP basic authentication scenario. A typical curl request looks like:
```
curl -X $HTTP_METHOD -H "Authorization: Bearer $TOKEN" "https://api.digitalocean.com/v2/$OBJECT"
```
Request and response shape

The DigitalOcean API is RESTful with JSON payloads. Developers throw typical HTTP verbs (GET/DELETE/PUT/POST/HEAD) against the endpoints. Let’s say that I wanted to retrieve a specific droplet – a “droplet” in DigitalOcean is equivalent to a virtual machine – via the API. I’d send a request to:
```
https://api.digitalocean.com/v2/droplets/[dropletid]
```
The response from such a request comes back as verbose JSON.
```
{
  "droplet": {
    "id": 3164494,
    "name": "example.com",
    "memory": 512,
    "vcpus": 1,
    "disk": 20,
    "locked": false,
    "status": "active",
    "kernel": {
      "id": 2233,
      "name": "Ubuntu 14.04 x64 vmlinuz-3.13.0-37-generic",
      "version": "3.13.0-37-generic"
    },
    "created_at": "2014-11-14T16:36:31Z",
    "features": [
      "ipv6",
      "virtio"
    ],
    "backup_ids": [

    ],
    "snapshot_ids": [
      7938206
    ],
    "image": {
      "id": 6918990,
      "name": "14.04 x64",
      "distribution": "Ubuntu",
      "slug": "ubuntu-14-04-x64",
      "public": true,
      "regions": [
        "nyc1",
        "ams1",
        "sfo1",
        "nyc2",
        "ams2",
        "sgp1",
        "lon1",
        "nyc3",
        "ams3",
        "nyc3"
      ],
      "created_at": "2014-10-17T20:24:33Z",
      "type": "snapshot",
      "min_disk_size": 20
    },
    "size": {
    },
    "size_slug": "512mb",
    "networks": {
      "v4": [
        {
          "ip_address": "104.131.186.241",
          "netmask": "255.255.240.0",
          "gateway": "104.131.176.1",
          "type": "public"
        }
      ],
      "v6": [
        {
          "ip_address": "2604:A880:0800:0010:0000:0000:031D:2001",
          "netmask": 64,
          "gateway": "2604:A880:0800:0010:0000:0000:0000:0001",
          "type": "public"
        }
      ]
    },
    "region": {
      "name": "New York 3",
      "slug": "nyc3",
      "sizes": [
        "32gb",
        "16gb",
        "2gb",
        "1gb",
        "4gb",
        "8gb",
        "512mb",
        "64gb",
        "48gb"
      ],
      "features": [
        "virtio",
        "private_networking",
        "backups",
        "ipv6",
        "metadata"
      ],
      "available": true
    }
  }
}
```
Breadth of services

DigitalOcean says that “all of the functionality that you are familiar with in the DigitalOcean control panel is also available through the API,” and that looks to be pretty accurate. DigitalOcean is known for their no-frills user experience, and with the exception of account management features, the API gives you control over most everything. Create droplets, create snapshots, move snapshots between regions, manage SSH keys, manage DNS records, and more.

SDKs, tools, and documentation

Developers can find lots of open source projects from DigitalOcean that favor Go and Ruby. There are a couple of official SDK libraries, and a whole host of other community supported ones. You’ll find ones for Ruby, Go, Python, .NET, Java, Node, and more.

DigitalOcean does a great job at documentation (with samples included), and also has a vibrant set of community contributions that apply to virtual any (cloud) environment. The contributed list of tutorials is fantastic.

Being so developer-centric, DigitalOcean can be found as a supported module in many 3rd party toolkits. You’ll find friendly extensions for Vagrant, Juju, SaltStack and much more.

Unique attributes

What stands out for me regarding DigitalOcean is the quality of their documentation, and complete developer focus. The API itself is fairly standard, but it’s presented in a way that’s easy to grok, the the ecosystem around the service is excellent.

Google Compute Engine

Google has lots of API-enabled services, and GCE is no exception.

Login mechanism

Google uses OAuth 2.0 and access tokens. Developers register their apps, define a scope, and request a short-lived access token. There are different flows depending on if you’re working with web applications (with interactive user login) versus service accounts (consent not required).

If you go the service account way, then you’ve got to generate a JSON Web Token (JWT) through a series of encoding and signing steps. The payload to GCE for getting a valid access token looks like:
```
POST /oauth2/v3/token HTTP/1.1
Host: www.googleapis.com
Content-Type: application/x-www-form-urlencoded

grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&amp;assertion=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiI3NjEzMjY3O…
```
Request and response shape

The Google API is RESTful and passes JSON messages back and forth. Operations map to HTTP verbs, and URIs reflect logical resources paths (as much as the term “methods” made me shudder). If you want a list of virtual machine instances, you’d send a request to:
```
https://www.googleapis.com/compute/v1/projects/<var>project</var>/global/images
```
The response comes back as JSON:
```
{
  "kind": "compute#imageList",
  "selfLink": <var>string</var>,
  "id": <var>string</var>,
  "items": [</pre>

 {
  "kind": "compute#image",
  "selfLink": <var>string</var>,
  "id": <var>unsigned long</var>,
  "creationTimestamp": <var>string</var>,
  "name": <var>string</var>,
  "description": <var>string</var>,
  "sourceType": <var>string</var>,
  "rawDisk": {
    "source": <var>string</var>,
    "sha1Checksum": <var>string</var>,
    "containerType": <var>string</var>
  },
  "deprecated": {
    "state": <var>string</var>,
    "replacement": <var>string</var>,
    "deprecated": <var>string</var>,
    "obsolete": <var>string</var>,
    "deleted": <var>string</var>
  },
  "status": <var>string</var>,
  "archiveSizeBytes": <var>long</var>,
  "diskSizeGb": <var>long</var>,
  "sourceDisk": <var>string</var>,
  "sourceDiskId": <var>string</var>,
  "licenses": [
    <var>string</var>
  ]
}],
  "nextPageToken": <var>string</var>
}
```
Breadth of services

The GCE API spans a lot of different capabilities that closely match what they offer in their management UI. There’s the base Compute API – this includes operations against servers, images, snapshots, disks, network, VPNs, and more – as well as beta APIs for Autoscalers and instance groups. There’s also an alpha API for user and account management.

SDKs, tools, and documentation

Google offers a serious set of client libraries. You’ll find libraries and dedicated documentation for Java, .NET, Go, Ruby, Objective C, Python and more.

The documentation for GCE is solid. Not only will you find detailed API specifications, but also a set of useful tutorials for setting up platforms (e.g. LAMP stack) or workflows (e.g. Jenkins + Packer + Kubernetes) on GCE.

Google lists out a lot of tools that natively integrate with the cloud service. The primary focus here is configuration management tools, with specific callouts for Chef, Puppet, Ansible, and SaltStack.

Unique attributes

GCE has a good user management API. They also have a useful batching capability where you can bundle together multiple related or unrelated calls into a single HTTP request. I’m also impressed by Google’s tools for trying out API calls ahead of time. There’s the Google-wide OAuth 2.0 playground where you can authorize and try out calls. Even better, for any API operation in the documentation, there’s a “try it” section at the bottom where you can call the endpoint and see it in action.

Microsoft Azure

Microsoft added virtual machines to its cloud portfolio a couple years ago, and has API-enabled most of their cloud services.

Login mechanism

One option for managing Azure components programmatically is via the Azure Resource Manager. Any action you perform on a resource requires the call to be authenticated with Azure Active Directory. To do this, you have to add your app to an Azure Active Directory tenant, set permissions for the app, and get a token used for authenticating requests.

The documentation says that you can set up this the Azure CLI or PowerShell commands (or the management UI). The same docs show a C# example of getting the JWT token back from the management endpoint.
```
public static string GetAToken()
{
  var authenticationContext = new AuthenticationContext("https://login.windows.net/{tenantId or tenant name}");
  var credential = new ClientCredential(clientId: "{application id}", clientSecret: {application password}");
  var result = authenticationContext.AcquireToken(resource: "https://management.core.windows.net/", clientCredential:credential);

  if (result == null) {
    throw new InvalidOperationException("Failed to obtain the JWT token");
  }

  string token = result.AccessToken;

  return token;
}
```
Microsoft also offers a direct Service Management API for interacting with most Azure items. Here you can authenticate using Azure Active Directory or X.509 certificates.

Request and response shape

The Resource Manager API appears RESTful and works with JSON messages. In order to retrieve the details about a specific virtual machine, you send a request to:
```
http://maagement.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Compute/virtualMachines/{vm-name}?api-version={api-version
```
The response JSON is fairly basic, and doesn’t tell you much about related services (e.g. networks or load balancers).
```
{
   "id":"/subscriptions/########-####-####-####-############/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{virtualMachineName}",
   "name":"virtualMachineName”,
  "   type":"Microsoft.Compute/virtualMachines",
   "location":"westus",
   "tags":{
      "department":"finance"
   },
   "properties":{
      "availabilitySet":{
         "id":"/subscriptions/########-####-####-####-############/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/availabilitySets/{availabilitySetName}"
      },
      "hardwareProfile":{
         "vmSize":"Standard_A0"
      },
      "storageProfile":{
         "imageReference":{
            "publisher":"MicrosoftWindowsServerEssentials",
            "offer":"WindowsServerEssentials",
            "sku":"WindowsServerEssentials",
            "version":"1.0.131018"
         },
         "osDisk":{
            "osType":"Windows",
            "name":"osName-osDisk",
            "vhd":{
               "uri":"http://storageAccount.blob.core.windows.net/vhds/osDisk.vhd"
            },
            "caching":"ReadWrite",
            "createOption":"FromImage"
         },
         "dataDisks":[

         ]
      },
      "osProfile":{
         "computerName":"virtualMachineName",
         "adminUsername":"username",
         "adminPassword":"password",
         "customData":"",
         "windowsConfiguration":{
            "provisionVMAgent":true,
            "winRM": {
               "listeners":[{
               "protocol": "https",
               "certificateUrl": "[parameters('certificateUrl')]"
               }]
            },
            “additionalUnattendContent”:[
               {
                  “pass”:“oobesystem”,
                  “component”:“Microsoft-Windows-Shell-Setup”,
                  “settingName”:“FirstLogonCommands|AutoLogon”,
                  “content”:“<XML unattend content>”
               }               "enableAutomaticUpdates":true
            },
            "secrets":[

            ]
         },
         "networkProfile":{
            "networkInterfaces":[
               {
                  "id":"/subscriptions/########-####-####-####-############/resourceGroups/CloudDep/providers/Microsoft.Network/networkInterfaces/myNic"
               }
            ]
         },
         "provisioningState":"succeeded"
      }
   }
```
The Service Management API is a bit different. It’s also RESTful, but works with XML messages (although some of the other services like Autoscale seem to work with JSON). If you wanted to create a VM deployment, you’d send an HTTP POST request to:
```
https://management.core.windows.net/<subscription-id>/services/hostedservices/<cloudservice-name>/deployments
```
The result is an extremely verbose XML payload.

Breadth of services

In addition to an API for virtual machine management, Microsoft has REST APIs for virtual networks, load balancers, Traffic Manager, DNS, and more. The Service Management API appears to have a lot more functionality than the Resource Manager API.

Microsoft is stuck with a two portal user environment where the officially supported one (at https://manage.windowsazure.com) has different features and functions than the beta one (https://portal.azure.com). It’s been like this for quite a while, and hopefully they cut over to the new one soon.

SDKs, tools, and documentation

Microsoft provides lots of options on their SDK page. Developers can interact with the Azure API using .NET, Java, Node.js, PHP, Python, Ruby, and Mobile (iOS, Android, Windows Phone), and it appears that each one uses the Service Management APIs to interact with virtual machines. Frankly, the documentation around this is a bit confusing. The documentation about the virtual machines service is ok, and provides a handful of walkthroughs to get you started.

The core API documentation exists for both the Service Management API, and the Azure Resource Manager API. For each set of documentation, you can view details of each API call. I’m not a fan of the the navigation in Microsoft API docs. It’s not easy to see the breadth of API operations as the focus is on a single service at a time.

Microsoft has a lot of support for virtual machines in the ecosystem, and touts integration with Chef, Ansible, and Docker,

Unique attributes

Besides being a little confusing (which APIs to use), the Azure API is pretty comprehensive (on the Service Management side). Somewhat uniquely, the Resource Manager API has a (beta) billing API with data about consumption and pricing. While I’ve complained a bit here about Resource Manager and conflicting APIs, it’s actually a pretty useful thing. Developers can use the resource manager concept (and APIs) to group related resources and deliver access control and templating.

Also, Microsoft bakes in support for Azure virtual machines in products like Azure Site Recovery.

Summary

The common thing you see across most cloud APIs is that they provide solid coverage of the features the user can do in the vendor’s graphical UI. We also saw that more and more attention is being paid to SDKs and documentation to help developers get up and running. AWS has been in the market the longest, so you see maturity and breadth in their API, but also a heavier interface (authentication, XML payloads). CenturyLink and Google have good account management APIs, and Azure’s billing API is a welcome addition to their portfolio. Amazon, CenturyLink, and Google have fairly verbose API responses, and CenturyLink is the only one with a hypermedia approach of linking to related resources. Microsoft has a messier API story than I would have expected, and developers will be better off using SDKs!

What do you think? Do you use the native APIs of cloud providers, or prefer to go through SDKs or brokers?
August 3, 2015
Comparing Clouds: “Day 2” Management Operations
So far in this blog series, I’ve taken a look at how to provision and scale servers using five leading cloud providers. Now, I want to dig into support for “Day 2 operations” like troubleshooting, reactive or proactive maintenance, billing, backup/restore, auditing, and more. In this blog post, we’ll look at how to manage (long-lived) running instances at each provider and see what capabilities exist to help teams manage at scale. For each provider, I’ll assess instance management, fleet management, and account management.

There might be a few reasons you don’t care a lot about the native operational support capabilities in your cloud of choice. For instance:
- You rely on configuration management solutions for steady-state. Fair enough. If your organization relies on great tools like Ansible, Chef or CFEngine, then you already have a consistent way to manage a fleet of servers and avoid configuration drift.
- You use “immutable servers.” In this model, you never worry about patching or updating running machines. Whenever something has to change, you deploy a new instance of a gold image. This simplifies many aspects of cloud management.
- You leverage “managed” servers in the cloud. If you work with a provider that manages your cloud servers for you, then on the surface, there is less need for access to robust management services.
- You’re running a small fleet of servers. If you only have a dozen or so cloud servers, then management may not be the most important thing on your mind.
- You leverage a multi-cloud management tool. As companies chase the “multi-cloud” dream, they leverage tools like RightScale, vRealize, and others to provide a single experience across a cloud portfolio.
However, I contend that the built-in operational capabilities of a particular cloud are still relevant for a variety of reasons, including:
- Deployments and upgrades. It’s wonderful if you use a continuous deployment tool to publish application changes, but cloud capabilities still come into play. How do you open up access cloud servers and push code to them? Can you disable operational alarms while servers are in an upgrading state? Is it easy to snapshot a machine, perform an update, and roll back if necessary? There’s no one way to do application deployments, so your cloud environment’s feature set may still play an important role.
- Urgent operational issues. Experiencing a distributed denial of service attack? Need to push an urgent patch to one hundred servers? Trying to resolve a performance issue with a single machine? Automation and visibility provided by the cloud vendor can help.
- Handle steady and rapid scale. There’s a good chance that your cloud footprint is growing. More environments, more instances, more scenarios. How does your cloud make it straightforward to isolate cloud instances by function or geography? A proper configuration management tool goes a long way to making this possible, but cloud-native functionality will be important as well.
- Audit trails. Users may interact with the cloud platform via a native UI, third party UI, or API. Unless you have a robust log aggregation solution that pulls data from each system that fronts the cloud, it’s useful to have the system of record (usually the cloud itself) capture information centrally.
- UI as a window to the API. Many cloud consumers don’t ever see the user interface provided by the cloud vendor. Rather, they only use the available API to provision and manage cloud resources. We’ll look at each cloud provider’s API in a future post, but the user interface often reveals the feature set exposed by the API. Even if you are an API-only user, seeing how the Operations experience is put together in a user interface can help you see how the vendor approaches operational stories.
Let’s get going in alphabetical order.

DISCLAIMER: I’m the product owner for the CenturyLink Cloud. Obviously my perspective is colored by that. However, I’ve taught three well-received courses on AWS, use Microsoft Azure often as part of my Microsoft MVP status, and spend my day studying the cloud market and playing with cloud technology. While I’m not unbiased, I’m also realistic and can recognize strengths and weaknesses of many vendors in the space.

Amazon Web Services

Instance Management

Users can do a lot of things with each particular AWS instance. I can create copies (“Launch more like this”), convert to a template, issue power operations, set and apply tags, and much more.

AWS has a super-rich monitoring system called CloudWatch that captures all sorts of metrics and capable of sending alarms.

Fleet Management

AWS shows all your servers in a flat, paging, list.

You can filter the list based on tag/attribute/keyword associated with the server(s). Amazon also JUST announced Resource Grouping to make it easier to organize assets.

When you’ve selected a set of servers in the list, you can do things like issue power operations in bulk.

Monitoring also works this way. However, Autoscale does not work against collections of servers.

It’d be negligent of me to talk about management at scale in AWS without talking about Elastic Beanstalk and OpsWorks. Beanstalk puts an AWS-specific wrapper around an “application” that may be comprised on multiple individual servers. A Beanstalk application may have a load balancer, and be part of an Autoscaling group. It’s also a construct for doing rolling deployments. Once a Beanstalk app is up and running, the user can manage the fleet as a unit.

Once you have a Beanstalk application, you can terminate and restart the entire environment.

There are still individual servers shown in the EC2 console, but Beanstalk makes it simpler to manage related assets.

OpsWorks is a relatively new offering used to define and deploy “stacks” comprised of application layers. Developers can associate Chef recipes to multiple stages of the lifecycle. You can also run recipes manually at any time.

Account Management

AWS doesn’t offer any “aggregate” views that roll up your consumption across all regions. The dashboards are service specific, and are shown on a region-by-region basis. AWS accounts are autonomous, and you don’t share anything between them. Within an account, user can do a lot of things. For instance, the Identity and Access Management service lets you define customized groups of users with very specific permission sets.

AWS has also gotten better at showing detailed usage reports.

The invoice details are still a bit generic and don’t easily tie back to a given server.

There are a host of other AWS services that make account management easier. These include CloudTrail for API audit logs and SNS for push notifications.

CenturyLink Cloud

Instance Management

For an individual virtual server in CenturyLink Cloud, the user has a lot of management options. It’s pretty easy to resize, clone, archive, and issue power commands.

Doing a deployment but want to be able to revert any changes? The platform supports virtual machine snapshots for creating restore points.

Each server details page shows a few monitoring metrics.

Users can also bind usage alert and vertical autoscale policies to a server.

Fleet Management

CenturyLink Cloud has you organize servers into collections called “Groups.” These Groups – which behave similarly to a nested file structure – are management units.

Users can issue bulk power operations against all or some of the servers in a Group. Additionally, you can set “scheduled tasks” on a Group. For instance, power off all the servers in a Group every Friday night, and turn them back on Monday morning.

You can also choose pre-loaded or dynamic actions to perform against the servers in a Group. These packages could be software (e.g. new antivirus client) or scripts (e.g. shut off a firewall port) that run against any or all of the servers at once.

The CenturyLink Cloud also provides an aggregated view across data centers. In this view, it’s fairly straightforward to see active alarms (notice the red on the offending server, group, and data center), and navigate the fleet of resources.

Finally, the platform offers a “Global Search” where users can search for servers located in any data center.

Account Management

Within CenturyLink Cloud, there’s a concept of an account hierarchy. Accounts can be nested within one another. Networks and other settings can be inherited (or separated), and user permissions cascade down.

Throughout the system, users can see the month-to-date and projected cost of their cloud consumption. The invoice data itself shows costs on a per server, and per Group basis. This is handy for chargeback situations where teams pay for specific servers or entire environments.

CenturyLink Cloud offers role-based access controls for a variety of personas. These apply to a given account, and any sub-accounts beneath it.

The CenturyLink Cloud has other account administration features like push-based notifications (“webhooks”) and a comprehensive audit trail.

Digital Ocean

Instance Management

Digital Ocean specializes in simplicity targeted at developers, but their experience is still serves up a nice feature set. From the server view, you can issue power operations, resize the machine, create snapshots, change the server name, and more.

There are a host of editable settings that touch on networking, Linux Kernel, and recovery processes.

Digital Ocean gives developers a handful of metrics that clearly show bandwidth consumption and resource utilization.

There’s a handy audit trail below each server that clearly identifies what operations were performed and how long they took.

Fleet Management

Digital Ocean focuses on the developer audience and API users. Their UI console doesn’t really have a concept of managing a fleet of servers. There’s no option to select multiple servers, sort columns, or perform bulk activities.

Account Management

The account management experience is fairly lightweight at Digital Ocean. You can view account resources like snapshots and backups.

It’s easy to create new SSH keys for accessing servers.

The invoice experience is simple but clear. You can see current charges, and how much each individual server cost.

The account history shows a simple audit trail.

Google Compute Engine

Instance Management

The Google Compute Engine offers a nice amount of per-server management options. You can connect to a server via SSH, reboot it, clone it, and delete it. There are also a set of monitoring statistics clearly shown at the top of each server’s details.

Additionally, you can change settings for storage, network, and tags.

Fleet Management

The only thing you really do with a set of Google Compute Engine servers is delete them.

Google Compute Engine offers Instance groups for organizing virtual resources. They can all be based on the same template and work together in an autoscale fashion, or, you can put different types of servers into an instance group.

An instance group is really just a simple construct. You don’t manage the items as a group, and if you delete the group, the servers remain. It’s simply a way to organize assets.

Account Management

Google Compute Engine offers a few different types of management roles including owner, editor, and viewer.

What’s nice is that you can also have separate billing managers. Other billing capabilities include downloading usage history, and reviewing fairly detailed invoices.

I don’t yet see an audit trail capability, so I assume that you have to track activities some other way.

Microsoft Azure

Instance Management

Microsoft is in transition between its legacy, production portal, and it’s new blade-oriented portal. For the classic portal, Microsoft crams a lot of useful details into each server’s “details” page.

The preview portal provides even more information, in a more … unique … format.

In either environment, Azure makes it easy to add disks, change virtual machine size, and issue power ops.

Microsoft gives users a useful set of monitoring metrics on each server.

Unlike the classic portal, the new one has better cost transparency.

Fleet Management

There are no bulk actions in the existing portal, besides filtering which Azure subscription to show, and sorting columns. Like AWS, Azure shows a flat list of servers in your account.

The preview portal has the same experience, but without any column sorting.

Account Management

Microsoft Azure users have a wide array of account settings to work with. It’s easy to see current consumption and how close to the limits you are.

The management service gives you an audit log.

New portal gives users the ability to set a handful of account roles for each server. I don’t see a way to apply these roles globally, but it’s a start!

The pricing information is better in the preview portal, although the costs are still fairly coarse and not at a per-machine basis.

Summary

Each of these providers has a very unique take on server management. Whether your virtual servers typically live for three hours or three years, the provider’s management capabilities will come into play. Think about what your development and operations staff need to be successful, and take an active role in planning how Day 2 operations in your cloud will work. Consider things like bulk management, audit trails, and security controls when crafting your strategy!
December 18, 2014