Richard Seroter's Architecture Musings

Category: DevOps

Using Steeltoe for ASP.NET 4.x apps that need a microservices-friendly config store
Nowadays, all the cool kids are doing microservices. Whether or not you care, there ARE some really nice distributed systems patterns that have emerged from this movement. Netflix and others have shared novel solutions for preventing cascading failures, discovering services at runtime, performing client-side load balancing, and storing configurations off-box. For Java developers, many of these patterns have been baked into turnkey components as part of Spring Cloud. But what about .NET devs who want access to all this goodness? Enter Steeltoe.

Steeltoe is an open-source .NET project that gives .NET Framework and .NET Core developers easy access to Spring Cloud services like Spring Cloud Config (Git-backed config server) and Spring Cloud Eureka (service discovery from Netflix). In this blog post, I’ll show you how easy it is to create a config server, and then connect to it from an ASP.NET app using Steeltoe.

Why should .NET devs care about a config server? We’ve historically thrown our (sometimes encrypted) config values into web.config files or a database. Kevin Hoffman says that’s now an anti-pattern because you end up with mutable build artifacts and don’t have an easy way to rotate encryption keys. With fast-changing (micro)services, and more host environments than ever, a strong config strategy is a must. Spring Cloud Config gives you a web-scale config server that supports Git-backed configurations, symmetric or asymmetric encryption, access security, and no-restart client refreshes.

Many Steeltoe demos I’ve seen use .NET Core as the runtime, but my non-scientific estimate is that 99.991% of all .NET apps out there are .NET 4.x and earlier, so let’s build a demo with a Windows stack.

Before starting to build the app, I needed actual config files! Spring Cloud Config works with local files, or preferably, a Git repo. I created a handful of files in a GitHub repository that represent values for an “inventory service” app. I have one file for dev, QA, and production environments. These can be YAML files or property files.

Let’s code stuff. I went and built a simple Spring Cloud Config server using Spring Tool Suite. To say “built” is to overstate how silly easy it is to do. Whether using Spring Tool Suite or the fantastic Spring Initializr site, if it takes you more than six minutes to build a config server, you must be extremely drunk.

Next, I chose which dependencies to add to the project. I selected the Config Server, which is part of Spring Cloud.

With my app scaffolding done, I added a ton of code to serve up config server endpoints, define encryption/decryption logic, and enable auto-refresh of clients. Just kidding. It takes a single annotation on my main Java class:
```
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.config.server.EnableConfigServer;

@SpringBootApplication
@EnableConfigServer
public class BlogConfigserverApplication {

	public static void main(String[] args) {
		SpringApplication.run(BlogConfigserverApplication.class, args);
	}
}
```
Ok, there’s got to be more than that, right? Yes, I’m not being entirely honest. I also had to throw this line into my application.properties file so that the config server knew where to pull my GitHub-based configuration files.
```
spring.cloud.config.server.git.uri=https://github.com/rseroter/blog-configserver
```
That’s it for a basic config server. Now, there are tons of other things you CAN configure around access security, multiple source repos, search paths, and more. But this is a good starting point. I quickly tested my config server using Postman and saw that by just changing the profile (dev/qa/default) in the URL, I’d pull up a different config file from GitHub. Spring Cloud Config makes it easy to use one or more repos to serve up configurations for different apps representing different environments. Sweet.

Ok, so I had a config server. Next up? Using Steeltoe so that my ASP.NET 4.6 app could easily retrieve config values from this server.

I built a new ASP.NET MVC app in Visual Studio 2015.

Next, I searched NuGet for Steeltoe, and found the configuration server library.

Fortunately .NET has some extension points for plugging in an outside configuration source. First, I created a new appsettings.json file at the root of the project. This file describes a few settings that help map to the right config values on the server. Specifically, the name of the app and URL of the config server. FYI, the app name corresponds to the config file name in GitHub. What about whether we’re using dev, test, or prod? Hold on, I’m getting there dammit.
```
{
    "spring": {
        "application": {
           "name": "inventoryservice"
         },
        "cloud": {
           "config": {
             "uri": "[my ip address]:8080"
           }
        }
    }
}
```
Next up, I created the class in the “App_Start” project folder that holds the details of our configuration, and looks to the appsettings.json file for some pointers. I stole this class from the nice Steeltoe demos, so don’t give me credit for being smart.
```
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;

//added by me
using Microsoft.AspNetCore.Hosting;
using System.IO;
using Microsoft.Extensions.FileProviders;
using Microsoft.Extensions.Configuration;
using Steeltoe.Extensions.Configuration;

namespace InventoryService
{
    public class ConfigServerConfig
    {
        public static IConfigurationRoot Configuration { get; set; }

        public static void RegisterConfig(string environment)
        {
            var env = new HostingEnvironment(environment);

            // Set up configuration sources.
            var builder = new ConfigurationBuilder()
                .SetBasePath(AppDomain.CurrentDomain.BaseDirectory)
                .AddJsonFile("appsettings.json")
                .AddConfigServer(env);

            Configuration = builder.Build();
        }
    }
    public class HostingEnvironment : IHostingEnvironment
    {
        public HostingEnvironment(string env)
        {
            EnvironmentName = env;
        }

        public string ApplicationName
        {
            get
            {
                throw new NotImplementedException();
            }

            set
            {
                throw new NotImplementedException();
            }
        }

        public IFileProvider ContentRootFileProvider
        {
            get
            {
                throw new NotImplementedException();
            }

            set
            {
                throw new NotImplementedException();
            }
        }

        public string ContentRootPath
        {
            get
            {
                throw new NotImplementedException();
            }

            set
            {
                throw new NotImplementedException();
            }
        }

        public string EnvironmentName { get; set; }

        public IFileProvider WebRootFileProvider { get; set; }

        public string WebRootPath { get; set; }

        IFileProvider IHostingEnvironment.WebRootFileProvider
        {
            get
            {
                throw new NotImplementedException();
            }

            set
            {
                throw new NotImplementedException();
            }
        }
    }
}
```
Nearly done! In the Global.asax.cs file, I needed to select which “environment” to use for my configurations. Here, I chose the “default” environment for my app. This means that the Config Server will return the default profile (configuration file) for my application.
```
protected void Application_Start()
{
  AreaRegistration.RegisterAllAreas();
  RouteConfig.RegisterRoutes(RouteTable.Routes);

  //add for config server, contains "profile" used
  ConfigServerConfig.RegisterConfig("default");
}
```
Ok, now to the regular ASP.NET MVC stuff. I added a new HomeController for the app, and looked into the configuration for my config value. If it was there, I added it to the ViewBag.
```
public ActionResult Index()
{
   var config = ConfigServerConfig.Configuration;
   if (null != config)
   {
       ViewBag.dbserver = config["dbserver"] ?? "server missing :(";
   }

   return View();
}
```
All that was left was to build a View to show the glorious result. I added a new Index.cshtml file and just printed out the value from the ViewBag. After starting up the app, I saw that the value printed out matches the value in the corresponding GitHub file:

If you’re a .NET dev like me, you’ll love Steeltoe. It’s easy to use and provides a much more robust, secure solution for app configurations. And while I think it’s best to run .NET apps in Pivotal Cloud Foundry, you can run these Steeltoe-powered .NET services anywhere you want.

Steeltoe is still in a pre-release mode, so try it out, submit GitHub issues, and give the team feedback on what else you’d like to see in the library.
October 18, 2016
Enterprises fighting back, Spring Boot is the best, and other SpringOne Platform takeaways
Last week I was in Las Vegas for SpringOne Platform. This conference had one of the greatest session lists I’ve ever seen, and brought together nearly 2,000 people interested in microservices, Java Spring, DevOps, agile, Cloud Foundry, and cloud-native development. With sponsors like Google, Microsoft, HortonWorks, Accenture, and AWS, and over 400 different companies represented by attendees, the conference had a unique blend of characters. I spent some time reflecting on the content and vibe of SpringOne Platform, and noticed that I kept coming back to the following themes.

#1 – Enterprises are fighting back.

Finally! Large, established companies are tired of operating slow-moving, decrepit I.T. departments where nothing interesting happens. At SpringOne Platform, I saw company after company talking about how they are creating change, and then showing the results. Watch this insightful keynote from Citi where they outline pain points, and how they’ve changed their team structure, culture, and technology:

You don’t have to work at Uber, Etsy, Netflix or AWS to work on cutting-edge technology. Enterprises have woken up to the fact that outsourcing their strategic technology skills was a dumb decision. What are they doing to recover?
1. Newfound focus on hiring and expanding technology talent. In just about every enterprise-led session I attended, the presentation closed with a “we’re hiring!” notice. Netflix has been ending their blog posts with this call-to-action for YEARS. Enterprises are starting to sponsor conferences and go where developers hang out. Additionally, because you can’t just hire hundreds of devs that know cloud-native patterns, I’m seeing enterprises make a greater investment in their existing people. That’s one reason Pluralsight continues to explode in popularity as enterprises purchase subscriptions for all their tech teams.
2. Upgrading and investing in technology. Give the devs what they want! Enterprises have started to realize that classic enterprise technology doesn’t attract talented people to work on it. Gartner predicts that by the year 2020, 75% of the apps supporting digital business will be built, not bought. That means that your dev teams need the tools and tech that let them crank out customer-centric, resilient apps. And they need support for using modern approaches to delivering software. If you invest in technology, you’ll attract the talent to work with it.
Bloomberg gives devs tools they need. And freedom to use and contribute to OSS. Their GitHub repo is here: https://t.co/83il8KklGn #s1p
— Richard Seroter (@rseroter) August 3, 2016

#2 – Spring Boot is the best application bootstrapping experience, period.

For 17+ years I’ve either coded in .NET or Node.js (with a little experimentation in Go, Ruby, and Java). After joining Pivotal, I decided that I should learn Spring, since that’s our jam.

I’ve never seen anything better than Spring Boot for getting developers rolling. Instead of spending hours (days?) setting up boilerplate code, and finding the right mix of dependencies for your project, Spring Boot takes care of all that. Give me 4 minutes, and I can build and deploy a git-backed Configuration Server. In a few moments I can flip on OAuth2 security or distributed tracing. And this isn’t hello-world quality stuff; this is the productization of Netflix OSS and other battle tested technology that you can use with simple code annotations. That’s amazing, and you can use the Spring Initializer to get started today.

Smart companies realize that devs shouldn’t be building infrastructure, app scaffolding or wrangling dependencies; they should be creating user experiences and business logic. Whereas Node.js has a billion packages and I spend plenty of time selecting ones that don’t have Guy Fieri images embedded, Spring Boot gives devs a curated, integrated set of packages. And it’s saving companies like Comcast, millions of dollars.

I agree the Spring Eco-system I am projecting will save us millions I can say that with confidence
— James Taylor (@jctbmwi8) August 7, 2016

Presenter after presenter at SpringOne Platform were able to quickly demonstrate complex distributed systems concepts by using Spring Boot apps. Java innovation happens in Spring.

#3 A wave of realism has swept over the industry.

I’m probably being optimistic, but it seems like some of the hype is settling down, and we’re actually getting to work on transformation. The SpringOne Platform talks (both in sessions, and hallway/lunch conversations) weren’t about visions of the future, but actual in-progress efforts. Transformation is hard and there aren’t shortcuts. Simply containerizing won’t make a difference, for example.

Stop pretending that things work the same wherever they run (e.g. cloud). It takes work, says @caseywest. #s1p pic.twitter.com/sUZQQcv55f
— Richard Seroter (@rseroter) August 2, 2016

Talk after talk, conducted by analysts or customers, highlighted the value of assessing your existing app portfolio, and identifying where refactoring or replatforming can add value. Just lifting and shifting to a container orchestration platform doesn’t actually improve things. At best, you’ve optimized the infrastructure, while ignoring the real challenge: improving the delivery pipeline. Same goes for configuration management, and other technologies that don’t establish meaningful change. It takes a mix of cultural overhaul, management buy-in, and yes, technology. I didn’t see anyone at the conference promising silver bullets. But at the same time, there were some concrete next steps for teams looking for accelerate their efforts.

#4 The cloud wars have officially moved above IaaS.

IaaS is definitely not a commodity (although pricing has stabilized), but you’re seeing the major three clouds working hard to own the services layer above the raw infrastructure. Gartner’s just-released IaaS Magic Quadrant shows clear leadership by AWS, Microsoft, and Google, and not accidentally, all three sponsored SpringOne Platform. Google brought over 20 people to the conference, and still couldn’t handle the swarms of people at their booth trying out Spring Boot! An integrated platform on top of leading clouds gives the best of all worlds.

By the end of the year, you’ll be able to consume Google data services natively on @pivotalcf. @jjhollywood to @wattersjames #springone #s1p
— Bridget Kromhout (@bridgetkromhout) August 3, 2016

Great infrastructure matters, but native services in the cloud are becoming the key differentiator for one over another. Want services to bridge on-premises and cloud apps? Azure is a strong choice. Need high performing data storage services? AWS is fantastic. Looking at next generation machine learning and data processing? Google is bleeding edge. At SpringOne Platform, I heard established companies—including Home Depot, the GAP, Merrill Corp—explain why the loved Pivotal Cloud Foundry, especially when it integrated with native services in their cloud of choice. The power of platforms, baby.

#5 Data microservices is the next frontier.

I love, love that we’re talking about the role of data in a microservices world. It’s one thing to design and deliver stateless web apps, and scale the heck out of them. We’ve got lots of patterns for that. But what about the data? Are there ways to deploy and manage data platforms with extreme automation? How about scaling real-time and batch data processing? There were tons of sessions about data at SpringOne Platform, and Pivotal’s Data team wrote up some awesome summaries throughout the week:
- Reflections from a “Data” Guy on Day 1 at SpringOne Platform 2016 by Jeff Kelly
- My Understanding of Database Provisioning and Deployment with Pivotal Cloud Foundry by Cesar Rojas
- Happy Developers are Productive Developers – A Lesson for Data Pros: Day 2 at SpringOne Platform by Jeff Kelly
- Data is a Journey, not a Destination. by Dan Baskette
- Cloud Native Architectures and the rise of Data Microservices by Fred Melo
It’s almost always about data, and I think it’s great that we had PACKED sessions full of people working through these emerging ideas.

#6 Pivotal is making a difference.

I’m very proud of what our customers are doing with the help of Pivotal people and technologies. While we tried to make sure we didn’t beat people over the head with “Pivotal is GREAT” stuff, it became clear that the “Pivotal Way” is working and transforming the how the largest companies in the world build software.

i've never been to a major vendor conference that had anything like this level of customer commitment to developer experience. #sp1 @pivotal
— we're done here (@monkchips) August 3, 2016

The Gap talked about going from weeks to deploy code changes, to mere minutes. That has a material impact on how they interact with their customers. And for many, this isn’t about net new applications. Almost everyone who presented talked about how to approach existing investments and find new value. It’s fun to be on this journey to simplify the future.

Want to help make a difference at Pivotal and drive the future of software? We’re always hiring.
August 9, 2016
Who is really supposed to use the (multi)cloud GUI?
How do YOU prefer to interact with infrastructure clouds? A growing number of people seem to prefer APIs, SDKs, and CLIs over any graphical UI. It’s easy to understand why: few GUIs offer the ability to create the repeatable, automated processes needed to use compute at scale. I just wrote up an InfoQ story about a big update to the AWS EC2 Run Command feature—spoiler: you can now execute commands against servers located ANYWHERE—and it got me thinking about how we interact with resources. In this post, I’ll try and figure out who cares about GUIs, and, show off an example of the EC2 Run Command in action.

If you’re still stuck dealing with servers and haven’t yet upgraded to an IaaS-agnostic cloud-native platform, then you’re looking for ways to create a consistent experience. Surveys keep showing that teams are flocking to GUI-light, automation-centric software for configuration management (e.g. Chef, Ansible), resource provisioning (e.g. Terraform, AWS CloudFormation, Azure Resource Manager), and software deployment. As companies do “hybrid computing” and mix and match servers from different providers, they really need to figure out a way to establish some consistent practices for building and managing many servers. Is the answer to use the cloud provider’s native GUI or a GUI-centric “multi-cloud manager” tool? I don’t think so.

Multi-cloud vendors are trying to put a useful layer of abstraction on top of non-commodity IaaS, but you end up with what AWS CEO Andy Jassy calls the “lowest common denominator.” Multi-cloud vendors struggle to keep up with the blistering release pace of public cloud vendors they support, and often neutralize the value of a given cloud by trying to create a common experience. No, the answer seems to be to use these GUIs for simple scenarios only, and rely primarily on APIs and automation that you can control.

But SOMEONE is using these (multi)cloud GUIs! They must offer some value. So who is the real audience for the cloud provider portals, or multi-cloud products now offered by Cisco (Cliqr), IBM (Gravitant), and CenturyLink (ElasticBox)?
- Business users. One clear area of value in cloud GUIs is for managers who want to dip in and see what’s been deployed, and finance personnel who are doing cost modeling and billing. The native portals offered by cloud providers are getting better at this, but it’s also been an area where multi-cloud brokers have invested heavily. I don’t want to ask the dev manager to write an app that pulls the AWS billing history. That seems … abusive. Use the GUI.
- Infrequent tech users with simple tasks. Look, I only log into the AWS portal every month or so. It wouldn’t make a ton of sense for me to build out a whole provisioning and management pipeline to build a server every so often. Even dropping down to the CLI isn’t more productive in those cases (for me). Other people at your company may be frequent, power users and it makes sense for them to automate the heck out of their cloud. In my case, the GUI is (mostly) fine. Many of the cloud provider portals reflect this reality. Look at the Azure Portal. It is geared towards executing individual actions with a lot of visual flair. It is not a productivity interface, or something supportive of bulk activities. Same with most multi-cloud tools I’ve seen. Go build a server, perform an action or two. In those cases, rock on. Use the GUI.
- Companies with only a few slow-changing servers. If you have 10-50 servers in the cloud, and you don’t turn them over very often, then it can make sense to use the native cloud GUI for a majority of your management. A multi-cloud broker would be overkill. Don’t prematurely optimize.
I think AWS nailed its target use case with EC2 Run Command. When it first launched in October of 2015, it was for AWS Windows servers. Amazon now supports Windows and Linux, and servers inside or outside of AWS data centers. Run ad-hoc PowerShell or Linux scripts, install software, update the OS, you name it. Kick it off with the AWS Console, API, SDK, CLI or via PowerShell extensions. And because it’s agent based and pull-driven, AWS doesn’t have to know a thing about the cloud the server is hosted in. It’s a straightforward, configurable, automation-centric, and free way to do basic cross-cloud management.

How’s it work? First, I created an EC2 “activation” which is used to generate a code to register the “managed instances.” When creating it, I also set up a security role in Identity and Access Management (IAM) which allows me to assign rights to people to issue commands.

Out of the activation, I received a code and ID that’s used to register a new server. With the activation in place, I built a pair of Windows servers in Microsoft Azure and CenturyLink Cloud. I logged into each server, and installed the AWS Tools for Windows PowerShell. Then, I pasted a simple series of commands into the Windows PowerShell for AWS window:
```
$dir = $env:TEMP + "\ssm"

New-Item -ItemType directory -Path $dir

cd $dir

(New-Object System.Net.WebClient).DownloadFile("https://amazon-ssm-us-east-1.s3.amazonaws.com/latest/windows_amd64/AmazonSSMAgentSetup.exe", $dir + "\AmazonSSMAgentSetup.exe")

Start-Process .\AmazonSSMAgentSetup.exe -ArgumentList @("/q", "/log", "install.log", "CODE=<my code>", "ID=<my id>", "REGION=us-east-1") -Wait

Get-Content ($env:ProgramData + "\Amazon\SSM\InstanceData\registration")

Get-Service -Name "AmazonSSMAgent"
```
The commands simply download the agent software, installs it as a Windows Service, and registers the box with AWS. Immediately after installing the agent on servers in other clouds, I saw them listed in the Amazon Console. Sweet.

Now the fun stuff. I can execute commands from existing Run Command documents (e.g. “install missing Windows updates”), run ad-hoc commands, find public documents written by others, or create my own documents.

For instance, I could do a silly-simple “ipconfig” ad-hoc request against my two servers …

… and I almost immediately received the resulting output. If I expected a ton of output from the command, I could log it all to S3 object storage.

As I pick documents to execute, the parameters change. In this case, choosing the “install application” document means that I provide a binary source and some parameters:

I’ve shown off the UI here (ironically, I guess), but the real value is that I could easily create documents or execute commands from the AWS CLI or something like the Node SDK. What a great way to do hybrid, ad-hoc management! It’s not a complete solution and doesn’t replace config management or multi-cloud provisioning tools, but it’s a pretty handy way to manage a fleet of distributed servers.

There’s definitely a place for GUIs when working with infrastructure clouds, but they really aren’t meant for power users. If you’re forcing your day-to-day operations/service team to work through a GUI-centric tool, you’re telling them that you don’t value their time. Rather, make sure that any vendor-provided software your operations team gets their hands on has an API. If not, don’t use it.

What do you think? Other scenarios with using the GUI makes the most sense?
July 13, 2016
Integration trends you should care about
Everyone’s doing integration nowadays. It’s not just the grizzled vet who reminisces about EDI or the seasoned DBA who can design snowflake schemas for a data warehouse in their sleep. No, now we have data scientists mashing up data sources, developers processing streams and connecting things via APIs, and “citizen integrators” (non-technical users) building event-driven actions on their own. It’s wild.

Here, I’ll take a look at a few things to keep an eye on, and the implications for you.

iPaaS

Research company Gartner coined the term Integration Platform-as-a-Service (iPaaS) to represent services that offer application integration capabilities in the cloud. Gartner delivers an annual assessment of these vendors in the form of a Magic Quadrant, and released the 2016 version back in March. While revenue in this space is still relatively small, the market is growing by 50%, and Gartner predicts that by 2019, iPaaS will be the preferred option for new projects. Kent Weare recently conducted an excellent InfoQ virtual panel about iPaaS with representatives from SnapLogic, Microsoft, and Mulesoft. I found a number of useful tidbits in there, and drew a few conclusions:
- The vendors are pushing their own endpoint connectors, but all seem to (somewhat grudgingly) recognize the value of consume “raw” APIs without a forced abstraction.
- An iPaaS model won’t take off unless it’s seen as viable for existing, on-premises systems. Latency and security matter, and it still seems like there’s work to be done here to ensure that iPaaS products can handle all the speed and connectivity requirements.
- Elasticity is an increasingly important value proposition of iPaaS. Instead of trying to build out a complete integration stack themselves that handle peak traffic, companies want something that dynamically scales. This is especially true given that Internet-of-Things is seem as a huge driver of iPaaS in the years ahead.
- User experience is more important than ever, and these vendors are paying special attention to the graphical UI. At the same time, they’ll need to keep working on the technical interface for things like automated testing. They seemed well positioned, however, to work with new types of transient microservices and short-lived containers.
There’s some cool stuff in the iPaaS space. It’s definitely worth your time to read Kent’s panel and explore some of these technologies more closely.

Microservices-driven integration

Have you heard of “microservices”? Of course you have, unless you’ve been asleep for the past eighteen months. This model of single-purpose, independently deployable services has taken off as groups rebel against the monolithic apps (and teams!) they’re saddled with today.

You’ll often hear microservices proponents question the usefulness of an Enterprise Service Bus. Why? They point out that ESBs are typically managed in organization silos, centralize too much of the processing, and offer much more functionality than most teams need. If you look at your application components arranged as a graph instead of a stack, then you realize a different integration toolset is needed.

Back in May, I was at Integrate 2016 where I delivered a talk on the Open Source Messaging Landscape (video now online). Lightweight messaging is BACK, baby! I’m seeing more and more teams looking to (open source) distributed messaging solutions when connecting their disparate services. This means software like Kafka, RabbitMQ, ZeroMQ, and NATS are going to continue to increase in relevance in the months and years ahead.

How are you supposed to orchestrate all these microservices integrations? That’s not an easy answer. One technology that’s trying to address this is Spring Cloud Data Flow. I saw a demo a couple week back, and walked away very impressed.

.@fredmelo_br is kinda blowing my mind in demo of Spring Cloud Data Flow. Treat integration as set of microsvcs pic.twitter.com/BhTau006sz

— Richard Seroter (@rseroter) June 21, 2016

Spring Cloud Data Flow is software that helps you create and run pipelines of data microservices. These microservices are loosely coupled but linked through a shared messaging layer and sit atop a variety of runtimes including Kubernetes, Apache Mesos, and Cloud Foundry. This gives you a very cool way to design and run modern system integration.

Serverless and “citizen integrators”

Microservices is a hot trend, but “serverless” is probably even hotter! A more accurate name is “function as a service” and engineer Mike Roberts wrote a great piece that covers criteria and use cases. Basically, it’s about running short-lived, often asynchronous, single operations without any knowledge about the underlying infrastructure.

This matters for integration people not just because you’ll see more and more messaging-oriented scenarios interacting with serverless engines, but because it’s opened up the door for many citizen developers to build event-driven integrations. These integration services meet the definition of “serverless” with their pay-pay-use, short-lived actions that abstract the infrastructure.

Look at the crazy popularity of IFTTT. “Regular” people can design pretty darn powerful integrations that start with an external trigger and end with an action. This stuff isn’t just for making automatic updates to Pinterest. Have you investigated Zapier? Their directory of connectors contains an impressive array of leading CRM, Finance, and Support systems that anyone can use. Microsoft’s in the game now with Flow for simple cloud-based workflows. Developers can take advantage of serverless products like AWS Lambda and Webtask (from Auth0) when custom code is needed.

Implications

What does all this mean to you? First and foremost, integration is hot again! I’m willing to bet that you’d benefit from investing some time in learning new and emerging tech. If you haven’t learned anything new in the integration space over the past two years, you’ve missed a lot. Take a course, pick up a book, or just hack around.

Recognize the growth of microservices and think about how it impacts your team. What tools will developers use to connect their services? What needs to be upgraded? What does this type of distributed integration do to your tracing and troubleshooting procedures? How can you break down the organizational silos and keep the “integration team” from being a bottleneck? Take a look at messaging software that can complement existing ESB software.

Finally, don’t miss out on this “citizen integrator” trend. How can you help your less technical colleagues connect their systems in novel ways? The world will always need integration specialists, but it’s important to support the growing needs of those who shouldn’t have to queue up for help from the experts.

What do you think? Any integration trends that stand out to you?
July 5, 2016
Modern Open Source Messaging: Apache Kafka, RabbitMQ and NATS in Action
Last week I was in London to present at INTEGRATE 2016. While the conference is oriented towards Microsoft technology, I mixed it up by covering a set of messaging technologies in the open source space; you can view my presentation here. There’s so much happening right now in the messaging arena, and developers have never had so many tools available to process data and link systems together. In this post, I’ll recap the demos I built to test out Apache Kafka, RabbitMQ, and NATS.

One thing I tried to make clear in my presentation was how these open source tools differ from classic ESB software. A few things you’ll notice as you check out the technologies below:
- Modern brokers are good at one thing. Instead of cramming in every sort of workflow, business intelligence, and adapter framework in the software, these OSS services are simply great at ingesting and routing lots of data. They are typically lightweight, but deployable in highly available configurations as well.
- Endpoints now have significant responsibility. Traditional brokers took on tasks such as message transformation, reliable transportation, long-running orchestration between endpoints, and more. The endpoints could afford to be passive, mostly-untouched participants in the integration. These modern engines don’t cede control to a centralized bus, but rather use the bus only to transport opaque data. Endpoints need to be smarter and I think that’s a good thing.
- Integration is approachable for ALL devs. Maybe a controversial opinion, but I don’t think integration work belongs in an” integration team.” If agility matters to you, then you can’t silo off a key function and force (micro)service teams to line up to get their work done. Integration work needs to be democratized, and many of these OSS tools make integration very approachable to any developer.
Apache Kafka

With Kafka you can do both real-time and batch processing. Ingest tons of data, route via publish-subscribe (or queuing). The broker barely knows anything about the consumer. All that’s really stored is an “offset” value that specifies where in the log the consumer left off. Unlike many integration brokers that assume consumers are mostly online, Kafka can successfully persist a lot of data, and supports “replay” scenarios. The architecture is fairly unique; topics are arranged in partitions (for parallelism), and partitions are replicated across nodes (for high availability).

For this set of demos, I used Vagrant to stand up an Ubuntu box, and then installed both Zookeeper and Kafka. I also installed Kafka Manager, built by the engineering team at Yahoo!. I showed the conference audience this UI, and then added a new topic to hold server telemetry data.

All my demo apps were built with Node.js. For the Kafka demos, I used the kafka-node module. The consumer is simple: write 50 messages to our “server-stats” Kafka topic.
```
var serverTelemetry = {server:'zkee-022', cpu: 11.2, mem: 0.70, storage: 0.30, timestamp:'2016-05-11:01:19:22'};

var kafka = require('../../node_modules/kafka-node'),
    Producer = kafka.Producer,
    client = new kafka.Client('127.0.0.1:2181/'),
    producer = new Producer(client),
    payloads = [
        { topic: 'server-stats', messages: [JSON.stringify(serverTelemetry)] }
    ];

producer.on('error', function (err) {
    console.log(err);
});
producer.on('ready', function () {

    console.log('producer ready ...')
    for(i=0; i<50; i++) {
        producer.send(payloads, function (err, data) {
            console.log(data);
        });
    }
});
```
Before consuming the data (and it doesn’t matter that we haven’t even defined a consumer yet; Kafka stores the data regardless), I showed that the topic had 50 messages in it, and no consumers.

Then, I kicked up a consumer with a group ID of “rta” (for real time analytics) and read from the topic.
```
var kafka = require('../../node_modules/kafka-node'),
    Consumer = kafka.Consumer,
    client = new kafka.Client('127.0.0.1:2181/'),
    consumer = new Consumer(
        client,
        [
            { topic: 'server-stats' }
        ],
        {
            groupId: 'rta'
        }
    );

    consumer.on('message', function (message) {
        console.log(message);
    });
```
After running the code, I could see that my consumer was all caught up, with no “lag” in Kafka.

For my second Kafka demo, I showed off the replay capability. Data is removed from Kafka based on an expiration policy, but assuming the data is still there, consumers can go back in time and replay from any available point. In the code below, I go back to offset position 40 and read everything from that point on. Super useful if apps fail to process data and you need to try again, or if you have batch processing needs.
```
var kafka = require('../../node_modules/kafka-node'),
    Consumer = kafka.Consumer,
    client = new kafka.Client('127.0.0.1:2181/'),
    consumer = new Consumer(
        client,
        [
            { topic: 'server-stats', offset: 40 }
        ],
        {
            groupId: 'rta',
            fromOffset: true
        }
    );

    consumer.on('message', function (message) {
        console.log(message);
    });
```
RabbitMQ

RabbitMQ is a messaging engine that follows the AMQP 0.9.1 definition of a broker. It follows a standard store-and-forward pattern where you have the option to store the data in RAM, on disk, or both. It supports a variety of message routing paradigms. RabbitMQ can be deployed in a clustered fashion for performance, and mirrored fashion for high availability. Consumers listen directly on queues, but publishers only know about “exchanges.” These exchanges are linked to queues via bindings, which specify the routing paradigm (among other things).

For these demos (I only had time to do two of them at the conference), I used Vagrant to build another Ubuntu box and installed RabbitMQ and the management console. The management console is straightforward and easy to use.

For the first demo, I did a publish-subscribe example. First, I added a pair of queues: notes1 and notes2. I then showed how to create an exchange. In order to send the inbound message to ALL subscribers, I used a fanout routing type. Other options include direct (specific routing key), topic (depends on matching a routing key pattern), or headers (route on message headers).

I have an option to bind this exchange to another exchange or a queue. Here, see that I bound it to the new queues I created.

Any message into this exchange goes to both bound queues. My Node.js application used the amqplib module to publish a message.
```
var amqp = require('../../node_modules/amqplib/callback_api');

amqp.connect('amqp://rseroter:rseroter@localhost:5672/', function(err, conn) {
  conn.createChannel(function(err, ch) {
    var exchange = 'integratenotes';

    //exchange, no specific queue, message
    ch.publish(exchange, '', new Buffer('Session: Open-source messaging. Notes: Richard is now showing off RabbitMQ.'));
    console.log(" [x] Sent message");
  });
  setTimeout(function() { conn.close(); process.exit(0) }, 500);
});
```
As you’d expect, both apps listening on the queue got the message. The second demo used a direct routing with specific routing keys. This means that each queue will receive messages if their binding matches the provided routing key.

That’s handy, but you can also use a topic binding and then apply wildcards. This helps you route based on a pattern matching scenario. In this case, the first queue will receive a message if the routing key has 3 sections separated by a period, and the third value equals “slides.” So “day2.seroter.slides” would work, but “day2.seroter.recap” wouldn’t. The second queue gets a message only if the routing key starts with “day2”, has any middle value, and then “recap.”

NATS

If you’re looking for a high velocity communication bus that supports a host of patterns that aren’t realistic with traditional integration buses, then NATS is worth a look! NATS was originally built with Ruby and achieved a respectable 150k messages per second. The team rewrote it in Go, and now you can do an absurd 8-11 million messages per second. It’s tiny, just a 3MB Docker image! NATS doesn’t do persistent messaging; if you’re offline, you don’t get the message. It works as a publish-subscribe engine, but you can also get synthetic queuing. It also aggressively protects itself, and will auto-prune consumers that are offline or can’t keep up.

In my first demo, I did a poor man’s performance test. To be clear, this is not a good performance test. But I wanted to show that even a synchronous loop in Node.js could achieve well over a million messages per second. Here I pumped in 12 million messages and watched the stats using nats-top.

1.6 million messages per second, and barely using any CPU. Awesome.

The next demo was a new type of pattern. In a microservices world, it’s important to locate service at runtime, not hard-code references to them at design time. Solutions like Consul are great, but if you have a performant message bus, you can actually use THAT as the right loosely coupled intermediary. Here, an app wants to look up a service endpoint, so it publishes a request and waits to hear back from which service instances are online.
```
// Server connection
var nats = require('../../node_modules/nats').connect();

console.log('startup ...');

nats.request('service2.endpoint', function(response) {
    console.log('service is online at endpoint: ' + response);
});
```
Each microservice then has a listener attached to NATS and replies if it gets an “are you online?” request.
```
// Server connection
var nats = require('../../node_modules/nats').connect();

console.log('startup ...');

//everyone listens
nats.subscribe('service2.endpoint', function(request, replyTo) {
    nats.publish(replyTo, 'available at http://www.seroter.com/service2');
});
```
When I called the endpoint, I got back a pair of responses, since both services answered. The client then chooses which instance to call. Or, I could put the service listeners into a “queue group” which means that only one subscriber gets the request. Given that consumers aren’t part of the NATS routing table if they are offline, I can be confident that whoever responds, is actually online.
```
// Server connection
var nats = require('../../node_modules/nats').connect();

console.log('startup ...');

//subscribe with queue groups, so that only one responds
nats.subscribe('service2.endpoint', {'queue': 'availability'}, function(request, replyTo) {
    nats.publish(replyTo, 'available at http://www.seroter.com/service2');
});
```
It’s a cool pattern. It only works if you can trust your bus. Any significant delay introduced by the messaging bus, and your apps slow to a crawl.

Summary

I came away from INTEGRATE 2016 impressed with Microsoft’s innovative work with integration-oriented Azure services. Event Hubs, Logic Apps and the like are going to change how we process data and events in the cloud. For those who want to run their own engines – and at no commercial licensing cost – it’s exciting to explore the open source domain. Kafka, RabbitMQ, and NATS are each different, and may complement your existing integration strategy, but they’re each worth a deep look!
May 16, 2016
Where to host your integration bus
RightScale recently announced the results of their annual “State of the Cloud” survey. You can find the report here, and my InfoQ.com story here. A lot of people participated in the survey, and the results showed that a majority of companies are growing their public cloud usage, but continuing to invest heavily in on-premises “cloud” environments. When I was reading this report, I was thinking about the implications on a company’s application integration strategy. As workloads continue to move to cloudy hosts and companies start to get addicted to the benefits of cloud (from the survey: “faster access to infrastructure”, “greater scalability”, “geographic reach”, “higher performance”), does that change what they think about running integration services? What are the options for a company wondering where to host their application/data integration engine, and what benefits and risks are associated with each choice?

The options below should apply whether you’re doing real-time or batch integration, high throughput messaging or complex orchestration, synchronous or asynchronous communication.

Option #1 – Use an Integration-as-a-Service engine in the public cloud

It may make sense to use public cloud integration services to connect your apps. Or, introduce these as edge intake services that still funnel data to another bus further downstream.

Benefits
- Easy to scale up or down. One of the biggest perks of a cloud-based service is that you don’t have to do significant capacity planning up front. For messaging services like Amazon SQS or the Azure Service Bus, there’s very little you have to consider. For an integration service like SnapLogic, there are limits, but you can size up and down as needed. The key is that you can respond to bursts (or troughs) in usage by cutting your costs. No more over-provisioning just in case you might need it.
- Multiple patterns available. You won’t see a glut of traditional ESB-like cloud integration services. Instead, you’ll find many high-throughput messaging (e.g. Google Pub/Sub) or stream processing services (e.g. Azure Stream Analytics) that take advantage of the elasticity of the cloud. However, if you’re doing bulk data movement, there are multiple viable services available (e.g. Talend Integration Cloud), if you’re doing stateful integration there are also services for that (e.g. Azure Logic Apps).
- No upgrade projects. From my experience, IT never likes funding projects that upgrade foundational infrastructure. That’s why you have servers still running Windows Server 2003, or Oracle databases that are 14 versions behind. You always tell yourself that “NEXT year we’ll get that done!” One of the seductive aspects of cloud-based services is that you don’t deal with that any longer. There are no upgrades; new capabilities just show up. And for all these cloud integration services, that means always getting the latest and greatest as soon as it’s available.
- Regular access to new innovations. Is there anything in tech more depressing than seeing all these flashy new features in a product that you use, and knowing that you are YEARS away from deploying it? Blech. The industry is changing so fast, that waiting 4 years for a refresh cycle is an eternity. If you’re using a cloud integration service, then you’re able to get new endpoint adapters, query semantics, storage enhancements and the like as soon as possible.
- Connectivity to cloud hosted systems, partners. One of the key reasons you’d choose a cloud-based integration service is so that you’re closer to your cloudy workloads. Running your web log ingest process, partner supply chain, or master-data management jobs all right next to your cloud-hosted databases and web apps gives you better performance and simpler connectivity. Instead of navigating the 12 layers of firewall hell to expose your on-premises integration service to Internet endpoints, you’re right next door.
- Distributed intake and consumption. Event and data sources are all over the place. Instead of trying to ship all that information to a centralized bus somewhere, it can make sense to do some intake at the edge. Cloud-based services let you spin up multiple endpoints in various geographies with ease, which may give you much more flexibility when taking in Internet-of-Things beacon data, orders from partners, or returning data from time-sensitive request/reply calls.
- Lower operational cost. You MAY end up paying less, but of course you could also end up paying more. Depends on your throughput, storage, etc. But ideally, if you’re using a cloud integration service, you’re not paying the same type of software licensing and hardware costs as you would for an on-premises system.
Risks
- High latency with on-premises systems. Unless your company was formed within the last 18 months, I’d be surprised if you didn’t have SOME key systems sitting in a local facility. While latency may not matter for some asynchronous workloads, if you’re taking in telemetry data from devices and making real-time adjustments to applications, every millisecond counts. Depending on where your home office is, there could be a bit of distance between your cloud-based integration engine and the key systems it talks to.
- Limited connectivity to on-premises systems (bi-directional). It’s usually not too challenging to get on-premises systems to reach out to the Internet (and push data to an endpoint), but it’s another matter to allow data to come *into* your on-premises systems from the Internet. Some integration services have solved this by putting agents on the local environment to facilitate secure communication, but realistically, it’ll be on you to extract data from cloud-based engines versus expecting them to push data into your data centers.
- Experience data leakage if data security isn’t properly factored in. If the data never leaves your private network, it can be easy to be lazy about security. Encrypt in transit? Ok. Encrypt the data as well? Nah. If that casual approach to security isn’t tightened up when you start passing data through cloud integration services, you could find yourself in trouble. While your data may be protected from others accidentally seeing it, you may have made it easy for others within your own organization to extract or tap into data they didn’t have access to before.
- Services are not as mature as software-based products, and focused mostly on messaging. It’s true that cloud-based solutions haven’t been around as long as the Tibcos, BizTalk Servers, and such. And, many cloud-based solutions focus less on traditional integration techniques (FTP! CSV files!) and more on Internet-scale data distribution.
- Opaque operational interfaces make troubleshooting more difficult. We’re talking about as-a-Service products here, so by definition, you’re not running this yourself. That means you can’t check out the server logs, add tracing logic, or view the memory consumption of a particular service. Instead, you only have the interfaces exposed by the vendor. If troubleshooting data is limited, you have no other recourse.
- Limited portability of the configuration between providers. Depending on the service you choose, there’s a level of lock-in that you have to accept. Your integration logic from one service can’t be imported into another. Frankly, the same goes for on-premises integration engines. Either way, your application/data integration platform is probably a key lock-in point regardless of where you host it.
- Unpredictable availability and uptime. A key value proposition of cloud is high availability, but you have to take the provider’s word for it that they’ve architected as such. If your cloud integration bus is offline, so are you. There’s no one to yell at to get it back up and running. Likewise, any maintenance to the platforms happens at a time that works for the vendor, not for you. Ideally you never see downtime, but you absolutely have less control over it.
- Unpredictable pricing on cost dimensions you may not have tracked before (throughput, storage). I’d doubt that most IT shops know their true cost of operations, but nonetheless, it’s possible to get sticker shock when you start paying based on consumption. Once you’ve sunk cost into an on-premises service, you may not care about message throughput or how much data you’re storing. You will care about things like that when using a pay-as-you-go cloud service.
Option #2 – Run your integration engine in a public cloud environment

If adopting an entirely managed public service isn’t for you, then you still may want the elastic foundation of cloud while running your preferred integration engine.

Benefits
- Run the engine of your choice. Like using Mule, BizTalk Server, or Apache Kafka and don’t want to give it up? Take that software and run it on public cloud Infrastructure-as-a-Service. No need to give up your preferred engine just because you want a more flexible host.
- Configuration is portable from on-premises solution (if migrating versus setting this up brand new). If you’re “upgrading” from fixed virtual machines or bare metal boxes to an elastic cloud, the software stays the same. In many cases, you don’t have to rewrite much (besides some endpoint addresses) in order to slide into an environment where you can resize the infrastructure up and down much easier.
- Scale up and down compute and storage. Probably the number one reason to move. Stop worrying about boxes that are too small (or large!) and running out of disk space. By moving from fixed on-premises environments to self-service cloud infrastructure, you can set an initial sizing and continue to right-size on a regular basis. About to beat the hell out of your RabbitMQ environment for a few days? Max out the capacity so that you can handle the load. Elasticity is possibly the most important reason to adopt cloud.
- Stay close to cloud hosted systems. Your systems are probably becoming more distributed, not more centralized. If you’re seeing a clear trend towards moving to cloud applications, then it may make sense to relocate your integration bus to be closer to them. And if you’re worried about latency, you could choose to run smaller edge instances of your integration bus that feed data to a centralized one. You have much more flexibility to introduce such an architecture when capacity is available anywhere, on-demand.
- Keep existing tools and skillsets around that engine. One challenge that you may have when adopting an integration-as-a-service product is the switching costs. Not only are you rebuilding your integration scenarios in a new product, but you’re also training up staff on an entirely new toolset. If you keep your preferred engine but move it to the public cloud, there are no new training costs.
- Low level troubleshooting available. If problems pop up – and of course they will – you have access to all the local logs, services, and configurations that you did before. Integration solutions are notoriously tricky to debug given the myriad locations where something could have gone amiss. The more data, the better.
- Experience easier integration scenarios with partners. You may love using BizTalk’s Trading Partner Management capabilities, but don’t like wrangling with network and security engineers to expose the right endpoints from your on-premises environment. If you’re running the same technology in the public cloud, you’ll have a simpler time securely exposing select endpoints and ports to key partners.
Risks
- Long distance from integrated systems. Like the risk in the section above, there’s concern that shifting your integration engine to the public cloud will mean taking it away from where all the apps are. Does the enhanced elasticity make up for the fact that your business data now has to leave on-premises systems and travel to a bus sitting miles away?
- Connectivity to on-premises systems. If your cloud virtual machines can’t reach your on-premises systems, you’re going to have some awkward integration scenarios. This is where Infrastructure-as-a-Service can be a little more flexible than cloud integration services because it’s fairly easy to set up a persistent, secure tunnel between cloud IaaS networks and on-premises networks. Not so easy to do with cloud messaging services.
- There’s a larger attack surface if engine has public IP connectivity. You may LIKE that your on-premises integration bus is hard to reach! Would-be attackers must breach multiple zones in order to attack this central nervous system of your company. By moving your integration engine to the cloud and opening up ports for inbound access, you’re creating a tempting target for those wishing to tap into this information-rich environment.
- Not getting any of the operation benefits that as-a-service products possess. One of the major downsides of this option is that you haven’t actually simplified much; you’re just hosting your software elsewhere. Instead of eliminating infrastructure headaches and focusing on connecting your systems, you’re still standing up (virtual) infrastructure, configuring networks, installing software, managing software updates, building highly available setups, and so on. You may be more elastic, but you haven’t reduced your operational burden.
- Few built in connectivity to cloudy endpoints. If you’re using an integration service that comes with pre-built endpoint adapters, you may find that traditional software providers aren’t keeping up with “cloud born” providers. SnapLogic will always have more cloud connectivity than BizTalk Server, for example. You may not care about this if you’re dealing with messaging engines that require you to write producer/consumer code. But for those that like having pre-built connectors to systems (e.g. IFTTT), you may be disappointed with your existing software provider.
- Availability and uptime, especially if the integration engine isn’t cloud-native. If you move your integration engine to cloud IaaS, it’s completely on you to ensure that you’ve got a highly available setup. Running ZeroMQ on a single cloud virtual machine isn’t going to magically provide a resilient back end. If you’re taking a traditional ESB product and running it in cloud VMs, you still likely can’t scale out as well as cloud-friendly distributed engines like Kafka or NATS.
Option #3 – Run your integration engine on-premises

Running an integration engine in the cloud may not be for you. Even if your applications are slowly (quickly?) moving to the cloud, you might want to keep your integration bus put.

Benefits
- Run the engine of your choice. No one can tell you what to do in your own house! Pick the ESB, messaging engine, or ETL tool that works for you.
- Control the change and maintenance lifecycle. This applies to option #2 to some extent, but when you control the software to the metal, you can schedule maintenance at optimal times and upgrade the software on your own timetable. If you’ve got a sensitive Big Data pipeline and want to reboot Spark ONLY when things are quiet, then you can do that.
- Close to all on-premises systems. Plenty of workloads are moving to public cloud, but it’s sure as heck not all of them. Or at least right now. You may be seeing commodity services like CRM or HR quickly going to cloud services, but lots of mission critical apps still sit within your data centers. Depending on what your data sources are, you may have a few years before you’re motivated to give your integration engine a new address.
- You can still reach out to Internet endpoints, while keeping inbound ports closed. If you’re running something like BizTalk Server, you can send data to cloud endpoints, and even receive data in (through the Service Bus) without exposing the service to the Internet. And if you’re using messaging engines where you write the endpoints, it may not really matter if the engine is on-site.
- Can get some elasticity through private clouds. Don’t forget about private clouds! While some may think private clouds are dumb (because they don’t achieve the operational benefits or elasticity of a public cloud), the reality is that many companies have doubled down on them. If you take your preferred integration engine and slide it over to your private cloud, you may get some of the elasticity and self-service benefits that public cloud customers get.
Risks
- Difficult to keep up to date with latest versions. As the pace of innovation and disruption picks up, you may find it hard to keep your backbone infrastructure up to date. By continuing to own the lifecycle of your integration software, you run the risk of falling behind. That may not matter if you like the version of the software that you are on – or if you have gotten great at building out new instances of your engines and swapping consumers over to them – but it’s still something that can cause problems.
- Subject to capacity limitations and slow scale up/out. Private clouds rarely have the same amount of hardware capacity that public clouds do. So even if you love dropping RabbitMQ into your private cloud, there may not be the storage or compute available when you need to quickly expand.
- Few native connectors to cloudy endpoints. Sticking with traditional software may mean that you stay stuck on a legacy foundation instead of adopting a technology that’s more suited to connecting cloud endpoints or high-throughput producers.
There’s no right or wrong answer here. Each company will have different reasons to choose an option above (or one that I didn’t even come up with!). If you’re interested in learning more about the latest advances in the messaging space, join me at the Integrate 2016 event (pre-registration here) in London on May 12-13. I’ll be doing a presentation on what’s new in the open source messaging space, and how increasingly popular integration patterns have changed our expectations of what an integration engine should be able to do.
March 8, 2016
Comparing Clouds: API Capabilities
API access is quickly becoming the most important aspect of any cloud platform. How easily can you automate activities using programmatic interfaces? What hooks do you have to connect on-premises apps to cloud environments? So far in this long-running blog series, I’ve taken a look at how to provision, scale, and manage the cloud environments of five leading cloud providers. In this post, I’ll explore the virtual-machine-based API offerings of the same providers. Specifically, I’m assessing:
- Login mechanism. How do you access the API? Is it easy for developers to quickly authenticate and start calling operations?
- Request and response shape. Does the API use SOAP or REST? Are payloads XML, JSON, or both? Does a result set provide links to follow to additional resources?
- Breadth of services. How comprehensive is the API? Does it include most of the capabilities of the overall cloud platform?
- SDKs, tools, and documentation. What developer SDKs are available, and is there ample documentation for developers to leverage?
- Unique attributes. What stands out about the API? Does it have any special capabilities or characteristics that make it stand apart?
As an aside, there’s no “standard cloud API.” Each vendor has unique things they offer, and there’s no base interface that everyone conforms to. While that makes it more challenge to port configurations from one provider to the next, it highlights the value of using configuration management tools (and to a lesser extent, SDKs) to provide abstraction over a cloud endpoint.

Let’s get moving, in alphabetical order.

DISCLAIMER: I’m the VP of Product for CenturyLink’s cloud platform. Obviously my perspective is colored by that. However, I’ve taught four well-received courses on AWS, use Microsoft Azure often as part of my Microsoft MVP status, and spend my day studying the cloud market and playing with cloud technology. While I’m not unbiased, I’m also realistic and can recognize strengths and weaknesses of many vendors in the space.

Amazon Web Services

Amazon EC2 is among the original cloud infrastructure providers, and has a mature API.

Login mechanism

For AWS, you don’t really “log in.” Every API request includes an HTTP header made up of the hashed request parameters signed with your private key. This signature is verified by AWS before executing the requested operation.

A valid request to the API endpoint might look like this (notice the Authorization header):
```
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Amz-Date: 20150501T130210Z
Host: ec2.amazonaws.com
Authorization: AWS4-HMAC-SHA256 Credential=KEY/20150501/us-east-1/ec2/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=ced6826de92d2bdeed8f846f0bf508e8559e98e4b0194b84example54174deb456c

[request payload]
```
Request and response shape

Amazon still supports a deprecated SOAP endpoint, but steers everyone to it’s HTTP services. To be clear, it’s not REST; while the API does use GET and POST, it typically throws a command and all the parameters into the URL. For instance, to retrieve a list of instances in your account, you’d issue a request to:
```
https://ec2.amazonaws.com/?Action=DescribeInstances&AUTHPARAMS
```
For cases where lots of parameters are required – for instance, to create a new EC2 instance – all the parameters are signed in the Authorization header and added to the URL.
```
https://ec2.amazonaws.com/?Action=RunInstances
&ImageId=ami-60a54009
&MaxCount=3
&MinCount=1
&KeyName=my-key-pair
&Placement.AvailabilityZone=us-east-1d
&AUTHPARAMS
```
Amazon APIs return XML. Developers get back a basic XML payload such as:
```
<DescribeInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2014-10-01/">
  <requestId>fdcdcab1-ae5c-489e-9c33-4637c5dda355</requestId>
    <reservationSet>
      <item>
        <reservationId>;r-1a2b3c4d</reservationId>
        <ownerId>123456789012</ownerId>
        <groupSet>
          <item>
            <groupId>sg-1a2b3c4d</groupId>
            <groupName>my-security-group</groupName>
          </item>
        </groupSet>
        <instancesSet>
          <item>
            <instanceId>i-1a2b3c4d</instanceId>
            <imageId>ami-1a2b3c4d</imageId>
```
Breadth of services

Each AWS service exposes an impressive array of operations. EC2 is no exception with well over 100. The API spans server provisioning and configuration, as well as network and storage setup.

I’m hard pressed to find anything in the EC2 management UI that isn’t available in the API set.

SDKs, tools, and documentation

AWS is known for its comprehensive documentation that stays up-to-date. The EC2 API documentation includes a list of operations, a basic walkthrough of creating API requests, parameter descriptions, and information about permissions.

SDKs give developers a quicker way to get going with an API, and AWS provides SDKs for Java, .NET, Node.js. PHP, Python and Ruby. Developers can find these SDKs in package management systems like npm (Node.js) and NuGet (.NET).

As you may expect, there are gobs of 3rd party tools that integrate with AWS. Whether it’s configuration management plugins for Chef or Ansible, or build automation tools like Terraform, you can expect to find AWS plugins.

Unique attributes

The AWS API is comprehensive with fine-grained operations. It also has a relatively unique security process (signature hashing) that may steer you towards the SDKs that shield you from the trickiness of correctly signing your request. Also, because EC2 is one of the first AWS services ever released, it’s using an older XML scheme. Newer services like DynamoDB or Kinesis offer a JSON syntax.

Amazon offers push-based notification through CloudWatch + SNS, so developers can get an HTTP push message when things like Autoscale events fire, or a performance alarm gets triggered.

CenturyLink Cloud

Global telecommunications and technology company CenturyLink offers a public cloud in regions around the world. The API has evolved from a SOAP/HTTP model (v1) to a fully RESTful one (v2).

Login mechanism

To use the CenturyLink Cloud API, developers send their platform credentials to a “login” endpoint and get back a reusable bearer token if the credentials are valid. That token is required for any subsequent API calls.

A request for token may look like:
```
POST https://api.ctl.io/v2/authentication/login HTTP/1.1
Host: api.ctl.io
Content-Type: application/json
Content-Length: 54

{
  "username": "[username]",
  "password": "[password]"
}
```
A token (and role list) comes back with the API response, and developers use that token in the “Authorization” HTTP header for each subsequent API call.
```
GET https://api.ctl.io/v2/datacenters/RLS1/WA1 HTTP/1.1
Host: api.ctl.io
Content-Type: application/json
Content-Length: 0
Authorization: Bearer [LONG TOKEN VALUE]
```
Request and response shape

The v2 API uses JSON for the request and response format. The legacy API uses XML or JSON with either SOAP or HTTP (don’t call it REST) endpoints.

To retrieve a single server in the v2 API, the developer sends a request to:
```
GET https://api.ctl.io/v2/servers/{accountAlias}/{serverId}
```
The responding JSON for most any service is verbose, and includes a number of links to related resources. For instance, in the example response payload below, notice that the caller can follow links to the specific alert policies attached to a server, billing estimates, and more.
```
{
  "id": "WA1ALIASWB01",
  "name": "WA1ALIASWB01",
  "description": "My web server",
  "groupId": "2a5c0b9662cf4fc8bf6180f139facdc0",
  "isTemplate": false,
  "locationId": "WA1",
  "osType": "Windows 2008 64-bit",
  "status": "active",
  "details": {
    "ipAddresses": [
      {
        "internal": "10.82.131.44"
      }
    ],
    "alertPolicies": [
      {
        "id": "15836e6219e84ac736d01d4e571bb950",
        "name": "Production Web Servers - RAM",
        "links": [
          {
            "rel": "self",
            "href": "/v2/alertPolicies/alias/15836e6219e84ac736d01d4e571bb950"
          },
          {
            "rel": "alertPolicyMap",
            "href": "/v2/servers/alias/WA1ALIASWB01/alertPolicies/15836e6219e84ac736d01d4e571bb950",
            "verbs": [
              "DELETE"
            ]
          }
        ]
     ],
    "cpu": 2,
    "diskCount": 1,
    "hostName": "WA1ALIASWB01.customdomain.com",
    "inMaintenanceMode": false,
    "memoryMB": 4096,
    "powerState": "started",
    "storageGB": 60,
    "disks":[
      {
        "id":"0:0",
        "sizeGB":60,
        "partitionPaths":[]
      }
    ],
    "partitions":[
      {
        "sizeGB":59.654,
        "path":"C:\\"
      }
    ],
    "snapshots": [
      {
        "name": "2014-05-16.23:45:52",
        "links": [
          {
            "rel": "self",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40"
          },
          {
            "rel": "delete",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40"
          },
          {
            "rel": "restore",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40/restore"
          }
        ]
      }
    ],
},
  "type": "standard",
  "storageType": "standard",
  "changeInfo": {
    "createdDate": "2012-12-17T01:17:17Z",
    "createdBy": "user@domain.com",
    "modifiedDate": "2014-05-16T23:49:25Z",
    "modifiedBy": "user@domain.com"
  },
  "links": [
    {
      "rel": "self",
      "href": "/v2/servers/alias/WA1ALIASWB01",
      "id": "WA1ALIASWB01",
      "verbs": [
        "GET",
        "PATCH",
        "DELETE"
      ]
    },
    …{
      "rel": "group",
      "href": "/v2/groups/alias/2a5c0b9662cf4fc8bf6180f139facdc0",
      "id": "2a5c0b9662cf4fc8bf6180f139facdc0"
    },
    {
      "rel": "account",
      "href": "/v2/accounts/alias",
      "id": "alias"
    },
    {
      "rel": "billing",
      "href": "/v2/billing/alias/estimate-server/WA1ALIASWB01"
    },
    {
      "rel": "statistics",
      "href": "/v2/servers/alias/WA1ALIASWB01/statistics"
    },
    {
      "rel": "scheduledActivities",
      "href": "/v2/servers/alias/WA1ALIASWB01/scheduledActivities"
    },
    {
      "rel": "alertPolicyMappings",
      "href": "/v2/servers/alias/WA1ALIASWB01/alertPolicies",
      "verbs": [
        "POST"
      ]
    },  {
      "rel": "credentials",
      "href": "/v2/servers/alias/WA1ALIASWB01/credentials"
    },

  ]
}
```
Breadth of services

CenturyLink provides APIs for a majority of the capabilities exposed in the management UI. Developers can create and manage servers, networks, firewall policies, load balancer pools, server policies, and more.

SDKs, tools, and documentation

CenturyLink recently launched a Developer Center to collect all the developer content in one place. It points to the Knowledge Base of articles, API documentation, and developer-centric blog. The API documentation is fairly detailed with descriptions of operations, payloads, and sample calls. Users can also watch brief video walkthroughs of major platform capabilities.

There are open source SDKs for Java, .NET, Python, and PHP. CenturyLink also offers an Ansible module, and integrates with multi-cloud manager tool vRealize from VMware.

Unique attributes

The CenturyLink API provides a few unique things. The platform has the concept of “grouping” servers together. Via the API, you can retrieve the servers in a groups, or get the projected cost of a group, among other things. Also, collections of servers can be passed into operations, so a developer can reboot a set of boxes, or run a script against many boxes at once.

Somewhat similar to AWS, CenturyLink offers push-based notifications via webhooks. Developers get a near real-time HTTP notification when servers, users, or accounts are created/changed/deleted, and also when monitoring alarms fire.

DigitalOcean

DigitalOcean heavily targets developers, so you’d expect a strong focus on their API. They have a v1 API (that’s deprecated and will shut down in November 2015), and a v2 API.

Login mechanism

DigitalOcean authenticates users via OAuth. In the management UI, developers create OAuth tokens that can be for read, or read/write. These token values are only shown a single time (for security reasons), so developers must make sure to save it in a secure place.

Once you have this token, you can either send the bearer token in the HTTP header, or, (and it’s not recommended) use it in an HTTP basic authentication scenario. A typical curl request looks like:
```
curl -X $HTTP_METHOD -H "Authorization: Bearer $TOKEN" "https://api.digitalocean.com/v2/$OBJECT"
```
Request and response shape

The DigitalOcean API is RESTful with JSON payloads. Developers throw typical HTTP verbs (GET/DELETE/PUT/POST/HEAD) against the endpoints. Let’s say that I wanted to retrieve a specific droplet – a “droplet” in DigitalOcean is equivalent to a virtual machine – via the API. I’d send a request to:
```
https://api.digitalocean.com/v2/droplets/[dropletid]
```
The response from such a request comes back as verbose JSON.
```
{
  "droplet": {
    "id": 3164494,
    "name": "example.com",
    "memory": 512,
    "vcpus": 1,
    "disk": 20,
    "locked": false,
    "status": "active",
    "kernel": {
      "id": 2233,
      "name": "Ubuntu 14.04 x64 vmlinuz-3.13.0-37-generic",
      "version": "3.13.0-37-generic"
    },
    "created_at": "2014-11-14T16:36:31Z",
    "features": [
      "ipv6",
      "virtio"
    ],
    "backup_ids": [

    ],
    "snapshot_ids": [
      7938206
    ],
    "image": {
      "id": 6918990,
      "name": "14.04 x64",
      "distribution": "Ubuntu",
      "slug": "ubuntu-14-04-x64",
      "public": true,
      "regions": [
        "nyc1",
        "ams1",
        "sfo1",
        "nyc2",
        "ams2",
        "sgp1",
        "lon1",
        "nyc3",
        "ams3",
        "nyc3"
      ],
      "created_at": "2014-10-17T20:24:33Z",
      "type": "snapshot",
      "min_disk_size": 20
    },
    "size": {
    },
    "size_slug": "512mb",
    "networks": {
      "v4": [
        {
          "ip_address": "104.131.186.241",
          "netmask": "255.255.240.0",
          "gateway": "104.131.176.1",
          "type": "public"
        }
      ],
      "v6": [
        {
          "ip_address": "2604:A880:0800:0010:0000:0000:031D:2001",
          "netmask": 64,
          "gateway": "2604:A880:0800:0010:0000:0000:0000:0001",
          "type": "public"
        }
      ]
    },
    "region": {
      "name": "New York 3",
      "slug": "nyc3",
      "sizes": [
        "32gb",
        "16gb",
        "2gb",
        "1gb",
        "4gb",
        "8gb",
        "512mb",
        "64gb",
        "48gb"
      ],
      "features": [
        "virtio",
        "private_networking",
        "backups",
        "ipv6",
        "metadata"
      ],
      "available": true
    }
  }
}
```
Breadth of services

DigitalOcean says that “all of the functionality that you are familiar with in the DigitalOcean control panel is also available through the API,” and that looks to be pretty accurate. DigitalOcean is known for their no-frills user experience, and with the exception of account management features, the API gives you control over most everything. Create droplets, create snapshots, move snapshots between regions, manage SSH keys, manage DNS records, and more.

SDKs, tools, and documentation

Developers can find lots of open source projects from DigitalOcean that favor Go and Ruby. There are a couple of official SDK libraries, and a whole host of other community supported ones. You’ll find ones for Ruby, Go, Python, .NET, Java, Node, and more.

DigitalOcean does a great job at documentation (with samples included), and also has a vibrant set of community contributions that apply to virtual any (cloud) environment. The contributed list of tutorials is fantastic.

Being so developer-centric, DigitalOcean can be found as a supported module in many 3rd party toolkits. You’ll find friendly extensions for Vagrant, Juju, SaltStack and much more.

Unique attributes

What stands out for me regarding DigitalOcean is the quality of their documentation, and complete developer focus. The API itself is fairly standard, but it’s presented in a way that’s easy to grok, the the ecosystem around the service is excellent.

Google Compute Engine

Google has lots of API-enabled services, and GCE is no exception.

Login mechanism

Google uses OAuth 2.0 and access tokens. Developers register their apps, define a scope, and request a short-lived access token. There are different flows depending on if you’re working with web applications (with interactive user login) versus service accounts (consent not required).

If you go the service account way, then you’ve got to generate a JSON Web Token (JWT) through a series of encoding and signing steps. The payload to GCE for getting a valid access token looks like:
```
POST /oauth2/v3/token HTTP/1.1
Host: www.googleapis.com
Content-Type: application/x-www-form-urlencoded

grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&amp;assertion=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiI3NjEzMjY3O…
```
Request and response shape

The Google API is RESTful and passes JSON messages back and forth. Operations map to HTTP verbs, and URIs reflect logical resources paths (as much as the term “methods” made me shudder). If you want a list of virtual machine instances, you’d send a request to:
```
https://www.googleapis.com/compute/v1/projects/<var>project</var>/global/images
```
The response comes back as JSON:
```
{
  "kind": "compute#imageList",
  "selfLink": <var>string</var>,
  "id": <var>string</var>,
  "items": [</pre>

 {
  "kind": "compute#image",
  "selfLink": <var>string</var>,
  "id": <var>unsigned long</var>,
  "creationTimestamp": <var>string</var>,
  "name": <var>string</var>,
  "description": <var>string</var>,
  "sourceType": <var>string</var>,
  "rawDisk": {
    "source": <var>string</var>,
    "sha1Checksum": <var>string</var>,
    "containerType": <var>string</var>
  },
  "deprecated": {
    "state": <var>string</var>,
    "replacement": <var>string</var>,
    "deprecated": <var>string</var>,
    "obsolete": <var>string</var>,
    "deleted": <var>string</var>
  },
  "status": <var>string</var>,
  "archiveSizeBytes": <var>long</var>,
  "diskSizeGb": <var>long</var>,
  "sourceDisk": <var>string</var>,
  "sourceDiskId": <var>string</var>,
  "licenses": [
    <var>string</var>
  ]
}],
  "nextPageToken": <var>string</var>
}
```
Breadth of services

The GCE API spans a lot of different capabilities that closely match what they offer in their management UI. There’s the base Compute API – this includes operations against servers, images, snapshots, disks, network, VPNs, and more – as well as beta APIs for Autoscalers and instance groups. There’s also an alpha API for user and account management.

SDKs, tools, and documentation

Google offers a serious set of client libraries. You’ll find libraries and dedicated documentation for Java, .NET, Go, Ruby, Objective C, Python and more.

The documentation for GCE is solid. Not only will you find detailed API specifications, but also a set of useful tutorials for setting up platforms (e.g. LAMP stack) or workflows (e.g. Jenkins + Packer + Kubernetes) on GCE.

Google lists out a lot of tools that natively integrate with the cloud service. The primary focus here is configuration management tools, with specific callouts for Chef, Puppet, Ansible, and SaltStack.

Unique attributes

GCE has a good user management API. They also have a useful batching capability where you can bundle together multiple related or unrelated calls into a single HTTP request. I’m also impressed by Google’s tools for trying out API calls ahead of time. There’s the Google-wide OAuth 2.0 playground where you can authorize and try out calls. Even better, for any API operation in the documentation, there’s a “try it” section at the bottom where you can call the endpoint and see it in action.

Microsoft Azure

Microsoft added virtual machines to its cloud portfolio a couple years ago, and has API-enabled most of their cloud services.

Login mechanism

One option for managing Azure components programmatically is via the Azure Resource Manager. Any action you perform on a resource requires the call to be authenticated with Azure Active Directory. To do this, you have to add your app to an Azure Active Directory tenant, set permissions for the app, and get a token used for authenticating requests.

The documentation says that you can set up this the Azure CLI or PowerShell commands (or the management UI). The same docs show a C# example of getting the JWT token back from the management endpoint.
```
public static string GetAToken()
{
  var authenticationContext = new AuthenticationContext("https://login.windows.net/{tenantId or tenant name}");
  var credential = new ClientCredential(clientId: "{application id}", clientSecret: {application password}");
  var result = authenticationContext.AcquireToken(resource: "https://management.core.windows.net/", clientCredential:credential);

  if (result == null) {
    throw new InvalidOperationException("Failed to obtain the JWT token");
  }

  string token = result.AccessToken;

  return token;
}
```
Microsoft also offers a direct Service Management API for interacting with most Azure items. Here you can authenticate using Azure Active Directory or X.509 certificates.

Request and response shape

The Resource Manager API appears RESTful and works with JSON messages. In order to retrieve the details about a specific virtual machine, you send a request to:
```
http://maagement.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Compute/virtualMachines/{vm-name}?api-version={api-version
```
The response JSON is fairly basic, and doesn’t tell you much about related services (e.g. networks or load balancers).
```
{
   "id":"/subscriptions/########-####-####-####-############/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{virtualMachineName}",
   "name":"virtualMachineName”,
  "   type":"Microsoft.Compute/virtualMachines",
   "location":"westus",
   "tags":{
      "department":"finance"
   },
   "properties":{
      "availabilitySet":{
         "id":"/subscriptions/########-####-####-####-############/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/availabilitySets/{availabilitySetName}"
      },
      "hardwareProfile":{
         "vmSize":"Standard_A0"
      },
      "storageProfile":{
         "imageReference":{
            "publisher":"MicrosoftWindowsServerEssentials",
            "offer":"WindowsServerEssentials",
            "sku":"WindowsServerEssentials",
            "version":"1.0.131018"
         },
         "osDisk":{
            "osType":"Windows",
            "name":"osName-osDisk",
            "vhd":{
               "uri":"http://storageAccount.blob.core.windows.net/vhds/osDisk.vhd"
            },
            "caching":"ReadWrite",
            "createOption":"FromImage"
         },
         "dataDisks":[

         ]
      },
      "osProfile":{
         "computerName":"virtualMachineName",
         "adminUsername":"username",
         "adminPassword":"password",
         "customData":"",
         "windowsConfiguration":{
            "provisionVMAgent":true,
            "winRM": {
               "listeners":[{
               "protocol": "https",
               "certificateUrl": "[parameters('certificateUrl')]"
               }]
            },
            “additionalUnattendContent”:[
               {
                  “pass”:“oobesystem”,
                  “component”:“Microsoft-Windows-Shell-Setup”,
                  “settingName”:“FirstLogonCommands|AutoLogon”,
                  “content”:“<XML unattend content>”
               }               "enableAutomaticUpdates":true
            },
            "secrets":[

            ]
         },
         "networkProfile":{
            "networkInterfaces":[
               {
                  "id":"/subscriptions/########-####-####-####-############/resourceGroups/CloudDep/providers/Microsoft.Network/networkInterfaces/myNic"
               }
            ]
         },
         "provisioningState":"succeeded"
      }
   }
```
The Service Management API is a bit different. It’s also RESTful, but works with XML messages (although some of the other services like Autoscale seem to work with JSON). If you wanted to create a VM deployment, you’d send an HTTP POST request to:
```
https://management.core.windows.net/<subscription-id>/services/hostedservices/<cloudservice-name>/deployments
```
The result is an extremely verbose XML payload.

Breadth of services

In addition to an API for virtual machine management, Microsoft has REST APIs for virtual networks, load balancers, Traffic Manager, DNS, and more. The Service Management API appears to have a lot more functionality than the Resource Manager API.

Microsoft is stuck with a two portal user environment where the officially supported one (at https://manage.windowsazure.com) has different features and functions than the beta one (https://portal.azure.com). It’s been like this for quite a while, and hopefully they cut over to the new one soon.

SDKs, tools, and documentation

Microsoft provides lots of options on their SDK page. Developers can interact with the Azure API using .NET, Java, Node.js, PHP, Python, Ruby, and Mobile (iOS, Android, Windows Phone), and it appears that each one uses the Service Management APIs to interact with virtual machines. Frankly, the documentation around this is a bit confusing. The documentation about the virtual machines service is ok, and provides a handful of walkthroughs to get you started.

The core API documentation exists for both the Service Management API, and the Azure Resource Manager API. For each set of documentation, you can view details of each API call. I’m not a fan of the the navigation in Microsoft API docs. It’s not easy to see the breadth of API operations as the focus is on a single service at a time.

Microsoft has a lot of support for virtual machines in the ecosystem, and touts integration with Chef, Ansible, and Docker,

Unique attributes

Besides being a little confusing (which APIs to use), the Azure API is pretty comprehensive (on the Service Management side). Somewhat uniquely, the Resource Manager API has a (beta) billing API with data about consumption and pricing. While I’ve complained a bit here about Resource Manager and conflicting APIs, it’s actually a pretty useful thing. Developers can use the resource manager concept (and APIs) to group related resources and deliver access control and templating.

Also, Microsoft bakes in support for Azure virtual machines in products like Azure Site Recovery.

Summary

The common thing you see across most cloud APIs is that they provide solid coverage of the features the user can do in the vendor’s graphical UI. We also saw that more and more attention is being paid to SDKs and documentation to help developers get up and running. AWS has been in the market the longest, so you see maturity and breadth in their API, but also a heavier interface (authentication, XML payloads). CenturyLink and Google have good account management APIs, and Azure’s billing API is a welcome addition to their portfolio. Amazon, CenturyLink, and Google have fairly verbose API responses, and CenturyLink is the only one with a hypermedia approach of linking to related resources. Microsoft has a messier API story than I would have expected, and developers will be better off using SDKs!

What do you think? Do you use the native APIs of cloud providers, or prefer to go through SDKs or brokers?
August 3, 2015
Recent Presentations (With Video!): Transitioning to PaaS and DevOps

I just finished up speaking at a run of conferences (Cloud Foundry Summit, ALM Forum, and the OpenStack Summit) and the recorded presentations are all online.

The Cloud Foundry Summit was excellent and you can find all the conference videos on the Cloud Foundry Youtube channel. I encourage you to watch some of the (brief) presentations by customers to hear now real people use application platforms to solve problems. My presentation (slides here) was about the enterprise transition to PaaS, and what large companies need to think about when introducing PaaS to their environment.

At the ALM Forum and OpenStack Summit, I talked about the journey to a DevOps organization and used CenturyLink as a case study. I went through our core values, logistics (org chart, office layout, tool set), and then a week-in-the-life that highlighted the various meetings and activities we do.

Companies making either journey will realize significant benefits, but not without proper planning. PaaS can be supremely disruptive to how applications are being delivered today, and DevOps may represent a fundamental shift in how technology services are positioned and prioritized at your company. While not directly related, PaaS and DevOps both address the emerging desire to treat I.T. as a competitive advantage.

Enjoy!

May 27, 2015
Docker Q&A for Windows Users
I’m a Windows guy. I’ve spent the better part of the past fifteen years on Windows laptops and servers. That fact probably hurts my geek cred, but hey, gotta be honest. Either way, it really seems that the coolest emerging technologies are happening in an open-source world on Linux. So how does someone from a Windows world make sense of something like Docker, for example? When I first dug into it a year ago, I had a bunch of questions. As I learned more – and started using it a bit – a number of things became clearer. Below is my take on what someone new to Docker might want to know. If for some reason you think one of my answers below is wrong, tell me!

Q: Is Docker just another means of virtualization?

A: Somewhat. It’s primarily a way to run lightweight containers that isolate processes. That process may be a web app, database, load balancer, or pretty much anything. Containers don’t do as much as a virtual machine, but that also makes them a lot easier to use (and destroy!). You may hear containers referred to as “operating system virtualization” which is fair since each container gets its own user space in the host OS.

Q: How do Docker containers differ from a virtual machine?

A: A virtual machine in Hyper-V or VMware virtualizes an entire guest operating system. Take a physical server, and share its resources among a bunch of virtualized servers running any operating system. With Docker, you’re isolating a process and its dependencies. A container shares the Linux kernel with other containers on the host machine. Instead of running a full copy of an OS like virtual machines do, containers pretty much consist of a directory! The container has everything you need for the application to run.

Q: Does Docker == container?

A: No, Docker refers to the daemon which builds, runs, and distributes the containers.

Q: Can I make something like Docker work natively on Windows?

A: Not really. While Microsoft is promising some sort of Docker support in the next version of Windows Server (due in 2016), they’ll have to introduce some significant changes to have truly native Docker support. Docker is written in Go and relies on a few core Linux technologies:
- Namespaces provide process isolation. Windows doesn’t really have something that maps directly to this.
- Control Groups are used to set up resource limits and constraints to help containers responsibly use host resources. Windows doesn’t have a way to limit resource consumption by a particular service.
- Union File Systems are file systems created by establishing layers. Docker uses these layers for container changes, and to establish a read/write file system. Not something you see in a Windows environment.
- Container Format that combines the previous components. Default format is libcontainer, but LXC is also supported.
Q: What’s in a Docker image?

A: Images are read-only templates that are used to create Docker containers. You can create your own images, update existing ones, or download from a registry like the Docker Hub. Downloaded images are stored by Docker on the host and can be easily used to create new containers. When you change an image, a new layer is built and added. This makes it simpler to distribute changes without distributing the whole image.

Q: How big are Docker images?

A: Base images could be as small as a few hundreds megabytes, to a few gigabytes. Image updates may be much smaller, but also include any (new) dependencies, that can cause the overall container size to grow more than expected.

Q: What’s the portability story with Docker?

A: Unlike virtual machines that require a particular hypervisor and are often quite large, containers run anywhere that Linux runs, and can be quickly built from small images.

Q: Does Docker run on Windows?

A: Somewhat. Developers who want to run Docker on Windows have to install a simple VM to get the necessary Linux-features. The Boot2Docker app gets this VM installed.

Q: Can I run more than one process in a Docker container?

A: While possible, yes (via a supervisor), the Docker team really believes that you should have a single process per container.

Q: Is a Dockerized app portable to any Linux host?

A: Ideally, yes. For example, a developer can start up an Ubuntu Docker image on a Red Hat host machine. However, issues can still arise if tools are dependent on kernel features that don’t exist on the host.

Q: How is data managed in a Dockerized app?

A: When you exit a container, the read-write file layer goes away. If you save the container as a new image, then the data is retained. Docker encourages developers to use data volumes and data volume containers to persist information. There’s a good StackOverflow question on this topic. Other solutions like Flocker have popped up as well.

Q: Is there one flavor of Linux that runs Docker better than others?

A: I don’t believe so. There are Docker-centric Linux distributions like CoreOS, but you can also easily run Docker on distros like SUSE and Ubuntu. Note that you should definitely be running a recent version of whatever distro you choose.

Q: What type of software can run in a Docker container?

A: Really, anything that runs on Linux should be able to fit in a container.

Q: Do I need the cloud to run Docker?

A: Definitely not. You can run Docker on virtual machines (locally or in the cloud), physical machines (locally or in the cloud), or in Docker-centric services like the Amazon EC2 Container Service. Even Microsoft’s own Azure has declared itself to be “Docker friendly.”

Q: What related technologies should I care about?

A: This whole “microservices with containers” revolution means that developers should learn a host of new things. Windows developers may not be as familiar with container deployment tools (e.g. fleet) container orchestration tools (e.g. Kubernetes, or Docker’s own services) or service discovery tools (like Zookeeper or Consul), but now’s a good time to start reading up!

Q: Where it it still immature?

A: This is a fast moving technology, and I’d bet that Docker will be “enterprise-ready” long before enterprises are ready to commit to it. Security is a emerging area, and architectural best practices are still forming.

Q: Ok, you’ve convinced me to try it out. What’s an easy way to get started on Windows?

A: Download Vagrant, stand up an Ubuntu image, and get Docker installed. Pull an app from the growing Docker Hub and walk through the tutorials provided by Docker. Or, start watching this great new Pluralsight course. Also consider trying out something like Panamax to easily create multi-container apps.

Hope this helps demystify Docker a bit. I’m far from an expert with it, but it’s really one of these technologies that might be critical to know in the years ahead!
February 2, 2015
8 Characteristics of our DevOps Organization
What is the human impact of DevOps? I recently got this question from a viewer of my recent DevOps: The Big Picture course on Pluralsight.

@rseroter just watched your DevOps Pluralsight. Great tool discussion was hoping you could talk more on org structure….

— Tim Barcz (@TimBarcz) August 16, 2014

I prepared this course based on a lot of research and my own personal experience. I’ve been part of a DevOps culture for about two years with CenturyLink Cloud. Now, you might say “it’s nice that DevOps works in your crazy startup world, but I work for a big company where this radical thinking gets ignored.” While Tier 3 – my employer that was acquired by CenturyLink last Fall – was a small, rebel band of cloud lunatics, I now work at a ~$20 billion company with 40,000+ people. If DevOps can work here, it can work anywhere.

Our cloud division does DevOps and we’re working with other teams to reproduce our model. How do we do it?
1. Simple reporting structure. Pretty much everyone is one step away from our executive leadership. We avoid complicated fiefdoms that introduce friction and foster siloed thinking. How are we arranged? Something like this:
  
  Business functions like marketing and finance are part of this structure as well. Obviously as teams continue to grow, they get carved up into disciplines, but the hierarchy remains as simplistic as possible.
2. Few managers, all leaders. This builds on the above point. We don’t really have any pure “managers” in the cloud organization. Sure, there are people with direct reports. But that person’s job goes well beyond people management. Rather, everyone on EVERY team is empowered to act in the best interest of our product/service. Teams have leaders who keep the team focused while being a well-informed representative to the broader organization. “Managers” are encouraged to build organizations to control, while “leaders” are encouraged to solve problems and pursue efficiency.
3. Development and Operations orgs are partners. This is probably the most important characteristic I see in our division. The leaders of Engineering (that contains development) and Service Engineering (that contains operations) are close collaborators who set an example for teamwork. There’s no “us versus them” tolerated, and issues that come up between the teams – and of course they do – are resolved quickly and decisively. Each VP knows the top priorities and pain points of the other. There’s legitimate empathy between the leaders and organizations.
4. Teams are co-located. Our Cloud Development Center in Bellevue is the cloud headquarters. A majority of our Engineering resources not only work there, but physically sit together in big rooms with long tables. One of our developers can easily hit a support engineer with a Nerf bullet. Co-location makes our daily standups easier, problem resolution simpler, and builds camaraderie among the various teams that build and support our global cloud. Now, there are folks distributed around the globe that are part of this Engineering team. I’m remote (most of the time) and many of our 24×7 support engineers reside in different time zones. How do we make sure distributed team members still feel involved? Tools like Slack make a HUGE difference, and regular standups and meetups make a big difference.
5. Everyone looks for automation opportunities. No one in this division likes doing things manually. We wear custom t-shirts that say “Run by Robots” for crying out loud! It’s in our DNA to automate everything. You cannot scale if you do not automate. Our support engineers use our API to create tools for themselves, developers have done an excellent job maturing our continuous integration and continuous delivery capability, and even product management builds things to streamline data analysis.
6. All teams responsible for the service. Our Operations staff is not responsible for keeping our service online. Wait, what? Our whole cloud organization is responsible for keeping our service healthy and meeting business need. There’s very little “that’s not MY problem” in this division. Sure, our expert support folks are the ones doing 24×7 monitoring and optimization, but developers wear pagers and get the same notifications if there’s a blip or outage. Anyone experiencing an issue with the platform – whether it’s me doing a demo, or a finance person pulling reports – is expected to notify our NOC. We’re all measured on the success of our service. Our VP of Engineering doesn’t get a bonus for shipping code that doesn’t work in production, and our VP of Service Engineering doesn’t get kudos if he maintains 100% uptime by disallowing new features. Everyone buys into the mission of building a differentiating, feature-rich product with exceptional uptime and support. And everyone is measured by that criteria.
7. Knowledge resides in team and lightweight documentation. I came from a company where I wrote beautiful design documentation that is probably never going to be looked at again. By having long-lived teams built around a product/service, the “knowledge base” is the team! People know how things work and how to handle problems because they’ve been working together with the same service for a long time. At the same time, we also maintain a documented public (and internal) Knowledge Base where processes, best practices, and exceptions are noted. Each internal KB article is simple and to the point. No fluff. What do I need to know? Anyone on the team can contribute to the Knowledge Base, and it’s teeming with super useful stuff that is actively used and kept up to date. How refreshing!
8. We’re not perfect, or finished! There’s so much more we can do. Continuous improvement is never done. There are things we still have to get automated, further barriers to break down between team handoffs, and more. As our team grows, other problems will inevitably surface. What matters is our culture and how we approach these problems. Is it an excuse to build up a silo or blame others? Or is it an opportunity to revisit existing procedures and make them better?
DevOps can mean a lot of things to a lot of people, but if you don’t have the organizational culture set up, it’s only a superficial implementation. It’s jarring to apply this to an existing organization, and I’m starting to witness that right now as we infect the rest of CenturyLink with our DevOps mindset. As that movement advances, I’ll let you know what we’ve learned along the way.

How about you? How is your organization set up to “do DevOps”?
August 28, 2014