Richard Seroter's Architecture Musings

Category: CenturyLink Cloud

Who is really supposed to use the (multi)cloud GUI?
How do YOU prefer to interact with infrastructure clouds? A growing number of people seem to prefer APIs, SDKs, and CLIs over any graphical UI. It’s easy to understand why: few GUIs offer the ability to create the repeatable, automated processes needed to use compute at scale. I just wrote up an InfoQ story about a big update to the AWS EC2 Run Command feature—spoiler: you can now execute commands against servers located ANYWHERE—and it got me thinking about how we interact with resources. In this post, I’ll try and figure out who cares about GUIs, and, show off an example of the EC2 Run Command in action.

If you’re still stuck dealing with servers and haven’t yet upgraded to an IaaS-agnostic cloud-native platform, then you’re looking for ways to create a consistent experience. Surveys keep showing that teams are flocking to GUI-light, automation-centric software for configuration management (e.g. Chef, Ansible), resource provisioning (e.g. Terraform, AWS CloudFormation, Azure Resource Manager), and software deployment. As companies do “hybrid computing” and mix and match servers from different providers, they really need to figure out a way to establish some consistent practices for building and managing many servers. Is the answer to use the cloud provider’s native GUI or a GUI-centric “multi-cloud manager” tool? I don’t think so.

Multi-cloud vendors are trying to put a useful layer of abstraction on top of non-commodity IaaS, but you end up with what AWS CEO Andy Jassy calls the “lowest common denominator.” Multi-cloud vendors struggle to keep up with the blistering release pace of public cloud vendors they support, and often neutralize the value of a given cloud by trying to create a common experience. No, the answer seems to be to use these GUIs for simple scenarios only, and rely primarily on APIs and automation that you can control.

But SOMEONE is using these (multi)cloud GUIs! They must offer some value. So who is the real audience for the cloud provider portals, or multi-cloud products now offered by Cisco (Cliqr), IBM (Gravitant), and CenturyLink (ElasticBox)?
- Business users. One clear area of value in cloud GUIs is for managers who want to dip in and see what’s been deployed, and finance personnel who are doing cost modeling and billing. The native portals offered by cloud providers are getting better at this, but it’s also been an area where multi-cloud brokers have invested heavily. I don’t want to ask the dev manager to write an app that pulls the AWS billing history. That seems … abusive. Use the GUI.
- Infrequent tech users with simple tasks. Look, I only log into the AWS portal every month or so. It wouldn’t make a ton of sense for me to build out a whole provisioning and management pipeline to build a server every so often. Even dropping down to the CLI isn’t more productive in those cases (for me). Other people at your company may be frequent, power users and it makes sense for them to automate the heck out of their cloud. In my case, the GUI is (mostly) fine. Many of the cloud provider portals reflect this reality. Look at the Azure Portal. It is geared towards executing individual actions with a lot of visual flair. It is not a productivity interface, or something supportive of bulk activities. Same with most multi-cloud tools I’ve seen. Go build a server, perform an action or two. In those cases, rock on. Use the GUI.
- Companies with only a few slow-changing servers. If you have 10-50 servers in the cloud, and you don’t turn them over very often, then it can make sense to use the native cloud GUI for a majority of your management. A multi-cloud broker would be overkill. Don’t prematurely optimize.
I think AWS nailed its target use case with EC2 Run Command. When it first launched in October of 2015, it was for AWS Windows servers. Amazon now supports Windows and Linux, and servers inside or outside of AWS data centers. Run ad-hoc PowerShell or Linux scripts, install software, update the OS, you name it. Kick it off with the AWS Console, API, SDK, CLI or via PowerShell extensions. And because it’s agent based and pull-driven, AWS doesn’t have to know a thing about the cloud the server is hosted in. It’s a straightforward, configurable, automation-centric, and free way to do basic cross-cloud management.

How’s it work? First, I created an EC2 “activation” which is used to generate a code to register the “managed instances.” When creating it, I also set up a security role in Identity and Access Management (IAM) which allows me to assign rights to people to issue commands.

Out of the activation, I received a code and ID that’s used to register a new server. With the activation in place, I built a pair of Windows servers in Microsoft Azure and CenturyLink Cloud. I logged into each server, and installed the AWS Tools for Windows PowerShell. Then, I pasted a simple series of commands into the Windows PowerShell for AWS window:
```
$dir = $env:TEMP + "\ssm"

New-Item -ItemType directory -Path $dir

cd $dir

(New-Object System.Net.WebClient).DownloadFile("https://amazon-ssm-us-east-1.s3.amazonaws.com/latest/windows_amd64/AmazonSSMAgentSetup.exe", $dir + "\AmazonSSMAgentSetup.exe")

Start-Process .\AmazonSSMAgentSetup.exe -ArgumentList @("/q", "/log", "install.log", "CODE=<my code>", "ID=<my id>", "REGION=us-east-1") -Wait

Get-Content ($env:ProgramData + "\Amazon\SSM\InstanceData\registration")

Get-Service -Name "AmazonSSMAgent"
```
The commands simply download the agent software, installs it as a Windows Service, and registers the box with AWS. Immediately after installing the agent on servers in other clouds, I saw them listed in the Amazon Console. Sweet.

Now the fun stuff. I can execute commands from existing Run Command documents (e.g. “install missing Windows updates”), run ad-hoc commands, find public documents written by others, or create my own documents.

For instance, I could do a silly-simple “ipconfig” ad-hoc request against my two servers …

… and I almost immediately received the resulting output. If I expected a ton of output from the command, I could log it all to S3 object storage.

As I pick documents to execute, the parameters change. In this case, choosing the “install application” document means that I provide a binary source and some parameters:

I’ve shown off the UI here (ironically, I guess), but the real value is that I could easily create documents or execute commands from the AWS CLI or something like the Node SDK. What a great way to do hybrid, ad-hoc management! It’s not a complete solution and doesn’t replace config management or multi-cloud provisioning tools, but it’s a pretty handy way to manage a fleet of distributed servers.

There’s definitely a place for GUIs when working with infrastructure clouds, but they really aren’t meant for power users. If you’re forcing your day-to-day operations/service team to work through a GUI-centric tool, you’re telling them that you don’t value their time. Rather, make sure that any vendor-provided software your operations team gets their hands on has an API. If not, don’t use it.

What do you think? Other scenarios with using the GUI makes the most sense?
July 13, 2016
You don’t need a private cloud, you need isolation options (and maybe more control!)

Private cloud is definitely still a “thing.” Survey after survey shows that companies are running apps in (on-premises) private clouds and cautiously embracing public cloud. But, it often seems that companies see this as a binary choice: wild-west public cloud, or fully dedicated private cloud. I just wrote up a report on Heroku Private Spaces, and this reinforces my belief that the future of IT is about offering increasingly sophisticated public cloud isolation options, NOT running infrastructure on-premises.

Why do companies choose to run things in a multi-tenant public cloud like Azure, AWS, Heroku, or CenturyLink? Because they want to offload responsibility for things that aren’t their core competencies, want elasticity to consume apps infrastructure on their timelines and in any geography, they like the constant access to new features and functionality, and it gives their development teams more tools to get revenue-generating products to market quickly.

How come everything doesn’t run in public clouds? Legit concerns exist about supportability for existing topologies and lack of controls that are “required” by audits. I put required in quotes because in many cases, the spirit of the control can be accomplished, even if the company-defined policies and procedures aren’t a perfect match. For many companies, the solution to these real or perceived concerns is often a private cloud.

However, “private cloud” is often a misnomer. It’s at best a hyper-converged stack that provides an on-demand infrastructure service, but more often it’s a virtualization environment with some elementary self-service capabilities, no charge-back options, no PaaS-like runtimes, and single-location deployments. When companies say they want private clouds, what they OFTEN need is a range of isolation options. By isolation, I mean fewer and fewer dependencies on shared infrastructure. Why isolation? There’s a need to survive an audit that includes detailed network traffic reports, user access logs, and proof of limited access by service provider staff. Or, you have an application topology that doesn’t fit in the “vanilla” public cloud setup. Think complex networking routes or IP spaces, or even application performance requirements.

To be sure, any public cloud today is already delivering isolation. Either your app (in the case of PaaS), or virtual infrastructure (in the case of IaaS) is walled off from other customers, even if they share a control plane. What is the isolation spectrum, and what’s in-between vanilla public cloud and on-premises hardware? I’ve made up a term (“Cloud Isolation Index”) and describe it below.

Customer Isolation

What is it?

This is the default isolation that comes with public clouds today. Each customer has their own carved-out place in a multi-tenant environment. Customers typically share a control plane, underlying physical infrastructure, and in some cases, even the virtual infrastructure. Virtual infrastructure may be shared when you’re considering application services like database-as-a-service, messaging services, identity services, and more.

How is it accomplished?

This is often accomplished through a mix of hardware and software. The base hardware being used by a cloud provider may offer some inherent multi-tenancy, but most likely, the provider is relying on a software tier that isolates tenants. It’s often the software layer that orchestrates an isolated sandbox across physical compute, networking, storage, and customer metadata.

What are the benefits and downsides?

There are lots of reasons that this default isolation level is attractive. Getting started in these environments takes seconds. You have base assurances that you’re not co-mingling your business critical information in a risky way. It’s easier to manage your account or get support because there’s nothing funky going on.

Downsides? You may not be able to satisfy all your audit and complexity concerns because your vanilla isolation doesn’t support customizations that could break other tenants. Public cloud also limits you to the locations that it’s running, so if you need a geography that’s not available from that provider, you’re out of luck.

Service Isolation

What is it?

Take an service and wall it off from other users within a customer account. You may share a control plane, account management, and underlying physical infrastructure. You’re seeing a new crop of solutions here, and I like this trend. Heroku Private Spaces gives you apps and data in a network isolated area of your account, Microsoft Azure Service Bus Premium Messaging delivers resource isolation for your messaging workloads. “Reserved instances” in cloud infrastructure environments serve a similar role. It’s about taking services or set of services and isolating them for security or performance reasons.

How is it accomplished?

It looks like Heroku Private Spaces works by using AWS VPC (see “environment isolation” below) and creating a private network for one or many apps targeted at a Space. Azure likely uses dedicated compute instances to run a messaging unit just for you. Dedicated or reserved services depend on network and (occasionally) compute isolation.

What are the benefits and downsides?

The benefits are clear. Instead of doing a coarse exercise (e.g. setting up dedicated private “cloud” infrastructure somewhere) because one component requires elevated isolation, carve up that app or set of services into a private area. By sharing a control plane with the “public” cloud components, you don’t increase your operational burden.

Environment Isolation (Native)

What is it?

Use vendor-provided cloud features to carve up isolation domains within your customer account. Instead of “Customer Isolation” where everything gets dumped into the vanilla account and everyone has access, here you thoughtfully design an environment and place apps in the right place. Most public clouds offer features to isolation workloads within a given account.

How is it accomplished?

Lots of ways to address this. In the CenturyLink Cloud, we offer things like account hierarchies where customers set up different accounts with unique permissions, network boundaries. Also, our customers use Bare Metal servers for dedicated workloads, role-based access controls to limit permissions, distinct network spaces with carefully crafted firewall policies, and more.

Amazon offer services like Virtual Private Cloud (VPC) that creates a private part of AWS with Internet access. Customers use access groups to control network traffic in and out of a VPC. Many clouds offer granular security permissions so that you can isolate permission and in some cases, access to specific workloads. You’ll also find cloud options for data encryption and other native data security features.

Select private cloud environments also fit into this category. CenturyLink sells a Private Cloud which is fully federated with the public cloud, but on a completely dedicated hardware stack in any of 50+ locations around the world. Here, you have native isolation in a self-service environment, but it still requires a capital outlay.

This is all typically accomplished using features that many clouds provide you out-of-the-box.

What are the benefits and downsides?

One huge benefit is that you can get many aspects of “private cloud” without actually making extensive commitments to dedicated infrastructure. Customers are seeking control and ways to wall-off sensitive workloads. By using inherent features of a global public cloud, you get greater assurances of protection without dramatically increasing your complexity/cost.

Environment Isolation (Manufactured)

What is it?

Sometimes the native capabilities of a public cloud are insufficient for the isolation level that you need. But, one of the great aspects of cloud is the extensibility and in some cases, customization. You’re likely still sharing a control plane and some underlying physical infrastructure.

How is it accomplished?

You can often create an isolated environment through additional software, “hybrid” infrastructure, and even hack-y work-arounds.

Most clouds offer a vast ecosystem of 3rd party open source and commercial appliances. Create isolated networks with an overlay solution, encrypt workloads at the host level, stand up self-managed database solutions, and much more. Look at something like Pivotal Cloud Foundry. Don’t want the built-in isolation provided by a public PaaS provider? Run a dedicated PaaS in your account and create the level of isolation that your apps demand.

You also have choices to weave environments together into a hybrid cloud. If you can’t place something directly in the cloud data center, then you can use things like Azure ExpressRoute or AWS Direct Connect to privately link to assets in remote data centers. Since CenturyLink is the 2nd largest colocation provider in the world, we often see customers put parts of their security stack or entirely different environments into our data center and do a direct connect to their cloud environment. In this way, you manufacture the isolation you need by connecting different components that reside in different isolation domains.

Another area that comes up with regards to isolation is vendor access. It’s one thing to secure workloads to prevent others within your company from accessing them. It’s another to also prevent the service provider themselves from accessing them! You make this happen by using encryption (that you own the keys for), additional network overlays, or even changing the passwords on servers to something that the cloud management platform doesn’t know.

What are the benefits and downsides?

If public cloud vendors *didn’t* offer the option to manufacture your desired isolation level, you’d see a limit to what ended up going there. The benefit of this level is that you can target more sensitive or complex workloads at the public cloud and still have a level of assurance that you’ve got an advanced isolation level.

The downside? You could end up with a very complicated configuration. If your cloud account no longer resembles its original state, you’ll find that your operational costs go up, and it might be more difficult to take advantage of new features being natively added to the cloud.

Total Isolation

What is it?

This is the extreme end of the spectrum. Stand up an on-premises or hosted private cloud that doesn’t share a control plane or any infrastructure with another tenant.

How is it accomplished?

You accomplish this level of isolation by buying stuff. You typically make a significant commit to infrastructure for the privilege of running it yourself, or paying someone else to run it on your behalf. You spend time working with consultants to size and install an environment.

What are the benefits and downsides?

The benefits? You have complete control of an infrastructure environment and can use the hardware vendors you want, and likely create any sort of configuration you need to support your existing topologies. The downside? You’re probably not getting anywhere near the benefit that your competitors are who are using the public cloud to scale faster, and in more places than you’ll ever be with owned infrastructure.

I’m not sure I feel the same way as Cloud Opinion, but the point is well taken.

@mjasay the total number of companies in the world for whom private cloud may make sense is approximately 25.
— so called parody. (@cloud_opinion) September 8, 2015

Summary

Isolation should be a feature, not a capital project.

This isolation concept is still a work in progress for me, and probably needs refinement. Am I missing parts of the spectrum? Have I undersold fully dedicated private infrastructure? It seems that if we talked more about isolation levels, and less about public vs. private, we’d be having smarter conversations. Agree?

September 17, 2015
Comparing Clouds: API Capabilities
API access is quickly becoming the most important aspect of any cloud platform. How easily can you automate activities using programmatic interfaces? What hooks do you have to connect on-premises apps to cloud environments? So far in this long-running blog series, I’ve taken a look at how to provision, scale, and manage the cloud environments of five leading cloud providers. In this post, I’ll explore the virtual-machine-based API offerings of the same providers. Specifically, I’m assessing:
- Login mechanism. How do you access the API? Is it easy for developers to quickly authenticate and start calling operations?
- Request and response shape. Does the API use SOAP or REST? Are payloads XML, JSON, or both? Does a result set provide links to follow to additional resources?
- Breadth of services. How comprehensive is the API? Does it include most of the capabilities of the overall cloud platform?
- SDKs, tools, and documentation. What developer SDKs are available, and is there ample documentation for developers to leverage?
- Unique attributes. What stands out about the API? Does it have any special capabilities or characteristics that make it stand apart?
As an aside, there’s no “standard cloud API.” Each vendor has unique things they offer, and there’s no base interface that everyone conforms to. While that makes it more challenge to port configurations from one provider to the next, it highlights the value of using configuration management tools (and to a lesser extent, SDKs) to provide abstraction over a cloud endpoint.

Let’s get moving, in alphabetical order.

DISCLAIMER: I’m the VP of Product for CenturyLink’s cloud platform. Obviously my perspective is colored by that. However, I’ve taught four well-received courses on AWS, use Microsoft Azure often as part of my Microsoft MVP status, and spend my day studying the cloud market and playing with cloud technology. While I’m not unbiased, I’m also realistic and can recognize strengths and weaknesses of many vendors in the space.

Amazon Web Services

Amazon EC2 is among the original cloud infrastructure providers, and has a mature API.

Login mechanism

For AWS, you don’t really “log in.” Every API request includes an HTTP header made up of the hashed request parameters signed with your private key. This signature is verified by AWS before executing the requested operation.

A valid request to the API endpoint might look like this (notice the Authorization header):
```
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Amz-Date: 20150501T130210Z
Host: ec2.amazonaws.com
Authorization: AWS4-HMAC-SHA256 Credential=KEY/20150501/us-east-1/ec2/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=ced6826de92d2bdeed8f846f0bf508e8559e98e4b0194b84example54174deb456c

[request payload]
```
Request and response shape

Amazon still supports a deprecated SOAP endpoint, but steers everyone to it’s HTTP services. To be clear, it’s not REST; while the API does use GET and POST, it typically throws a command and all the parameters into the URL. For instance, to retrieve a list of instances in your account, you’d issue a request to:
```
https://ec2.amazonaws.com/?Action=DescribeInstances&AUTHPARAMS
```
For cases where lots of parameters are required – for instance, to create a new EC2 instance – all the parameters are signed in the Authorization header and added to the URL.
```
https://ec2.amazonaws.com/?Action=RunInstances
&ImageId=ami-60a54009
&MaxCount=3
&MinCount=1
&KeyName=my-key-pair
&Placement.AvailabilityZone=us-east-1d
&AUTHPARAMS
```
Amazon APIs return XML. Developers get back a basic XML payload such as:
```
<DescribeInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2014-10-01/">
  <requestId>fdcdcab1-ae5c-489e-9c33-4637c5dda355</requestId>
    <reservationSet>
      <item>
        <reservationId>;r-1a2b3c4d</reservationId>
        <ownerId>123456789012</ownerId>
        <groupSet>
          <item>
            <groupId>sg-1a2b3c4d</groupId>
            <groupName>my-security-group</groupName>
          </item>
        </groupSet>
        <instancesSet>
          <item>
            <instanceId>i-1a2b3c4d</instanceId>
            <imageId>ami-1a2b3c4d</imageId>
```
Breadth of services

Each AWS service exposes an impressive array of operations. EC2 is no exception with well over 100. The API spans server provisioning and configuration, as well as network and storage setup.

I’m hard pressed to find anything in the EC2 management UI that isn’t available in the API set.

SDKs, tools, and documentation

AWS is known for its comprehensive documentation that stays up-to-date. The EC2 API documentation includes a list of operations, a basic walkthrough of creating API requests, parameter descriptions, and information about permissions.

SDKs give developers a quicker way to get going with an API, and AWS provides SDKs for Java, .NET, Node.js. PHP, Python and Ruby. Developers can find these SDKs in package management systems like npm (Node.js) and NuGet (.NET).

As you may expect, there are gobs of 3rd party tools that integrate with AWS. Whether it’s configuration management plugins for Chef or Ansible, or build automation tools like Terraform, you can expect to find AWS plugins.

Unique attributes

The AWS API is comprehensive with fine-grained operations. It also has a relatively unique security process (signature hashing) that may steer you towards the SDKs that shield you from the trickiness of correctly signing your request. Also, because EC2 is one of the first AWS services ever released, it’s using an older XML scheme. Newer services like DynamoDB or Kinesis offer a JSON syntax.

Amazon offers push-based notification through CloudWatch + SNS, so developers can get an HTTP push message when things like Autoscale events fire, or a performance alarm gets triggered.

CenturyLink Cloud

Global telecommunications and technology company CenturyLink offers a public cloud in regions around the world. The API has evolved from a SOAP/HTTP model (v1) to a fully RESTful one (v2).

Login mechanism

To use the CenturyLink Cloud API, developers send their platform credentials to a “login” endpoint and get back a reusable bearer token if the credentials are valid. That token is required for any subsequent API calls.

A request for token may look like:
```
POST https://api.ctl.io/v2/authentication/login HTTP/1.1
Host: api.ctl.io
Content-Type: application/json
Content-Length: 54

{
  "username": "[username]",
  "password": "[password]"
}
```
A token (and role list) comes back with the API response, and developers use that token in the “Authorization” HTTP header for each subsequent API call.
```
GET https://api.ctl.io/v2/datacenters/RLS1/WA1 HTTP/1.1
Host: api.ctl.io
Content-Type: application/json
Content-Length: 0
Authorization: Bearer [LONG TOKEN VALUE]
```
Request and response shape

The v2 API uses JSON for the request and response format. The legacy API uses XML or JSON with either SOAP or HTTP (don’t call it REST) endpoints.

To retrieve a single server in the v2 API, the developer sends a request to:
```
GET https://api.ctl.io/v2/servers/{accountAlias}/{serverId}
```
The responding JSON for most any service is verbose, and includes a number of links to related resources. For instance, in the example response payload below, notice that the caller can follow links to the specific alert policies attached to a server, billing estimates, and more.
```
{
  "id": "WA1ALIASWB01",
  "name": "WA1ALIASWB01",
  "description": "My web server",
  "groupId": "2a5c0b9662cf4fc8bf6180f139facdc0",
  "isTemplate": false,
  "locationId": "WA1",
  "osType": "Windows 2008 64-bit",
  "status": "active",
  "details": {
    "ipAddresses": [
      {
        "internal": "10.82.131.44"
      }
    ],
    "alertPolicies": [
      {
        "id": "15836e6219e84ac736d01d4e571bb950",
        "name": "Production Web Servers - RAM",
        "links": [
          {
            "rel": "self",
            "href": "/v2/alertPolicies/alias/15836e6219e84ac736d01d4e571bb950"
          },
          {
            "rel": "alertPolicyMap",
            "href": "/v2/servers/alias/WA1ALIASWB01/alertPolicies/15836e6219e84ac736d01d4e571bb950",
            "verbs": [
              "DELETE"
            ]
          }
        ]
     ],
    "cpu": 2,
    "diskCount": 1,
    "hostName": "WA1ALIASWB01.customdomain.com",
    "inMaintenanceMode": false,
    "memoryMB": 4096,
    "powerState": "started",
    "storageGB": 60,
    "disks":[
      {
        "id":"0:0",
        "sizeGB":60,
        "partitionPaths":[]
      }
    ],
    "partitions":[
      {
        "sizeGB":59.654,
        "path":"C:\\"
      }
    ],
    "snapshots": [
      {
        "name": "2014-05-16.23:45:52",
        "links": [
          {
            "rel": "self",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40"
          },
          {
            "rel": "delete",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40"
          },
          {
            "rel": "restore",
            "href": "/v2/servers/alias/WA1ALIASWB01/snapshots/40/restore"
          }
        ]
      }
    ],
},
  "type": "standard",
  "storageType": "standard",
  "changeInfo": {
    "createdDate": "2012-12-17T01:17:17Z",
    "createdBy": "user@domain.com",
    "modifiedDate": "2014-05-16T23:49:25Z",
    "modifiedBy": "user@domain.com"
  },
  "links": [
    {
      "rel": "self",
      "href": "/v2/servers/alias/WA1ALIASWB01",
      "id": "WA1ALIASWB01",
      "verbs": [
        "GET",
        "PATCH",
        "DELETE"
      ]
    },
    …{
      "rel": "group",
      "href": "/v2/groups/alias/2a5c0b9662cf4fc8bf6180f139facdc0",
      "id": "2a5c0b9662cf4fc8bf6180f139facdc0"
    },
    {
      "rel": "account",
      "href": "/v2/accounts/alias",
      "id": "alias"
    },
    {
      "rel": "billing",
      "href": "/v2/billing/alias/estimate-server/WA1ALIASWB01"
    },
    {
      "rel": "statistics",
      "href": "/v2/servers/alias/WA1ALIASWB01/statistics"
    },
    {
      "rel": "scheduledActivities",
      "href": "/v2/servers/alias/WA1ALIASWB01/scheduledActivities"
    },
    {
      "rel": "alertPolicyMappings",
      "href": "/v2/servers/alias/WA1ALIASWB01/alertPolicies",
      "verbs": [
        "POST"
      ]
    },  {
      "rel": "credentials",
      "href": "/v2/servers/alias/WA1ALIASWB01/credentials"
    },

  ]
}
```
Breadth of services

CenturyLink provides APIs for a majority of the capabilities exposed in the management UI. Developers can create and manage servers, networks, firewall policies, load balancer pools, server policies, and more.

SDKs, tools, and documentation

CenturyLink recently launched a Developer Center to collect all the developer content in one place. It points to the Knowledge Base of articles, API documentation, and developer-centric blog. The API documentation is fairly detailed with descriptions of operations, payloads, and sample calls. Users can also watch brief video walkthroughs of major platform capabilities.

There are open source SDKs for Java, .NET, Python, and PHP. CenturyLink also offers an Ansible module, and integrates with multi-cloud manager tool vRealize from VMware.

Unique attributes

The CenturyLink API provides a few unique things. The platform has the concept of “grouping” servers together. Via the API, you can retrieve the servers in a groups, or get the projected cost of a group, among other things. Also, collections of servers can be passed into operations, so a developer can reboot a set of boxes, or run a script against many boxes at once.

Somewhat similar to AWS, CenturyLink offers push-based notifications via webhooks. Developers get a near real-time HTTP notification when servers, users, or accounts are created/changed/deleted, and also when monitoring alarms fire.

DigitalOcean

DigitalOcean heavily targets developers, so you’d expect a strong focus on their API. They have a v1 API (that’s deprecated and will shut down in November 2015), and a v2 API.

Login mechanism

DigitalOcean authenticates users via OAuth. In the management UI, developers create OAuth tokens that can be for read, or read/write. These token values are only shown a single time (for security reasons), so developers must make sure to save it in a secure place.

Once you have this token, you can either send the bearer token in the HTTP header, or, (and it’s not recommended) use it in an HTTP basic authentication scenario. A typical curl request looks like:
```
curl -X $HTTP_METHOD -H "Authorization: Bearer $TOKEN" "https://api.digitalocean.com/v2/$OBJECT"
```
Request and response shape

The DigitalOcean API is RESTful with JSON payloads. Developers throw typical HTTP verbs (GET/DELETE/PUT/POST/HEAD) against the endpoints. Let’s say that I wanted to retrieve a specific droplet – a “droplet” in DigitalOcean is equivalent to a virtual machine – via the API. I’d send a request to:
```
https://api.digitalocean.com/v2/droplets/[dropletid]
```
The response from such a request comes back as verbose JSON.
```
{
  "droplet": {
    "id": 3164494,
    "name": "example.com",
    "memory": 512,
    "vcpus": 1,
    "disk": 20,
    "locked": false,
    "status": "active",
    "kernel": {
      "id": 2233,
      "name": "Ubuntu 14.04 x64 vmlinuz-3.13.0-37-generic",
      "version": "3.13.0-37-generic"
    },
    "created_at": "2014-11-14T16:36:31Z",
    "features": [
      "ipv6",
      "virtio"
    ],
    "backup_ids": [

    ],
    "snapshot_ids": [
      7938206
    ],
    "image": {
      "id": 6918990,
      "name": "14.04 x64",
      "distribution": "Ubuntu",
      "slug": "ubuntu-14-04-x64",
      "public": true,
      "regions": [
        "nyc1",
        "ams1",
        "sfo1",
        "nyc2",
        "ams2",
        "sgp1",
        "lon1",
        "nyc3",
        "ams3",
        "nyc3"
      ],
      "created_at": "2014-10-17T20:24:33Z",
      "type": "snapshot",
      "min_disk_size": 20
    },
    "size": {
    },
    "size_slug": "512mb",
    "networks": {
      "v4": [
        {
          "ip_address": "104.131.186.241",
          "netmask": "255.255.240.0",
          "gateway": "104.131.176.1",
          "type": "public"
        }
      ],
      "v6": [
        {
          "ip_address": "2604:A880:0800:0010:0000:0000:031D:2001",
          "netmask": 64,
          "gateway": "2604:A880:0800:0010:0000:0000:0000:0001",
          "type": "public"
        }
      ]
    },
    "region": {
      "name": "New York 3",
      "slug": "nyc3",
      "sizes": [
        "32gb",
        "16gb",
        "2gb",
        "1gb",
        "4gb",
        "8gb",
        "512mb",
        "64gb",
        "48gb"
      ],
      "features": [
        "virtio",
        "private_networking",
        "backups",
        "ipv6",
        "metadata"
      ],
      "available": true
    }
  }
}
```
Breadth of services

DigitalOcean says that “all of the functionality that you are familiar with in the DigitalOcean control panel is also available through the API,” and that looks to be pretty accurate. DigitalOcean is known for their no-frills user experience, and with the exception of account management features, the API gives you control over most everything. Create droplets, create snapshots, move snapshots between regions, manage SSH keys, manage DNS records, and more.

SDKs, tools, and documentation

Developers can find lots of open source projects from DigitalOcean that favor Go and Ruby. There are a couple of official SDK libraries, and a whole host of other community supported ones. You’ll find ones for Ruby, Go, Python, .NET, Java, Node, and more.

DigitalOcean does a great job at documentation (with samples included), and also has a vibrant set of community contributions that apply to virtual any (cloud) environment. The contributed list of tutorials is fantastic.

Being so developer-centric, DigitalOcean can be found as a supported module in many 3rd party toolkits. You’ll find friendly extensions for Vagrant, Juju, SaltStack and much more.

Unique attributes

What stands out for me regarding DigitalOcean is the quality of their documentation, and complete developer focus. The API itself is fairly standard, but it’s presented in a way that’s easy to grok, the the ecosystem around the service is excellent.

Google Compute Engine

Google has lots of API-enabled services, and GCE is no exception.

Login mechanism

Google uses OAuth 2.0 and access tokens. Developers register their apps, define a scope, and request a short-lived access token. There are different flows depending on if you’re working with web applications (with interactive user login) versus service accounts (consent not required).

If you go the service account way, then you’ve got to generate a JSON Web Token (JWT) through a series of encoding and signing steps. The payload to GCE for getting a valid access token looks like:
```
POST /oauth2/v3/token HTTP/1.1
Host: www.googleapis.com
Content-Type: application/x-www-form-urlencoded

grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer&amp;assertion=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiI3NjEzMjY3O…
```
Request and response shape

The Google API is RESTful and passes JSON messages back and forth. Operations map to HTTP verbs, and URIs reflect logical resources paths (as much as the term “methods” made me shudder). If you want a list of virtual machine instances, you’d send a request to:
```
https://www.googleapis.com/compute/v1/projects/<var>project</var>/global/images
```
The response comes back as JSON:
```
{
  "kind": "compute#imageList",
  "selfLink": <var>string</var>,
  "id": <var>string</var>,
  "items": [</pre>

 {
  "kind": "compute#image",
  "selfLink": <var>string</var>,
  "id": <var>unsigned long</var>,
  "creationTimestamp": <var>string</var>,
  "name": <var>string</var>,
  "description": <var>string</var>,
  "sourceType": <var>string</var>,
  "rawDisk": {
    "source": <var>string</var>,
    "sha1Checksum": <var>string</var>,
    "containerType": <var>string</var>
  },
  "deprecated": {
    "state": <var>string</var>,
    "replacement": <var>string</var>,
    "deprecated": <var>string</var>,
    "obsolete": <var>string</var>,
    "deleted": <var>string</var>
  },
  "status": <var>string</var>,
  "archiveSizeBytes": <var>long</var>,
  "diskSizeGb": <var>long</var>,
  "sourceDisk": <var>string</var>,
  "sourceDiskId": <var>string</var>,
  "licenses": [
    <var>string</var>
  ]
}],
  "nextPageToken": <var>string</var>
}
```
Breadth of services

The GCE API spans a lot of different capabilities that closely match what they offer in their management UI. There’s the base Compute API – this includes operations against servers, images, snapshots, disks, network, VPNs, and more – as well as beta APIs for Autoscalers and instance groups. There’s also an alpha API for user and account management.

SDKs, tools, and documentation

Google offers a serious set of client libraries. You’ll find libraries and dedicated documentation for Java, .NET, Go, Ruby, Objective C, Python and more.

The documentation for GCE is solid. Not only will you find detailed API specifications, but also a set of useful tutorials for setting up platforms (e.g. LAMP stack) or workflows (e.g. Jenkins + Packer + Kubernetes) on GCE.

Google lists out a lot of tools that natively integrate with the cloud service. The primary focus here is configuration management tools, with specific callouts for Chef, Puppet, Ansible, and SaltStack.

Unique attributes

GCE has a good user management API. They also have a useful batching capability where you can bundle together multiple related or unrelated calls into a single HTTP request. I’m also impressed by Google’s tools for trying out API calls ahead of time. There’s the Google-wide OAuth 2.0 playground where you can authorize and try out calls. Even better, for any API operation in the documentation, there’s a “try it” section at the bottom where you can call the endpoint and see it in action.

Microsoft Azure

Microsoft added virtual machines to its cloud portfolio a couple years ago, and has API-enabled most of their cloud services.

Login mechanism

One option for managing Azure components programmatically is via the Azure Resource Manager. Any action you perform on a resource requires the call to be authenticated with Azure Active Directory. To do this, you have to add your app to an Azure Active Directory tenant, set permissions for the app, and get a token used for authenticating requests.

The documentation says that you can set up this the Azure CLI or PowerShell commands (or the management UI). The same docs show a C# example of getting the JWT token back from the management endpoint.
```
public static string GetAToken()
{
  var authenticationContext = new AuthenticationContext("https://login.windows.net/{tenantId or tenant name}");
  var credential = new ClientCredential(clientId: "{application id}", clientSecret: {application password}");
  var result = authenticationContext.AcquireToken(resource: "https://management.core.windows.net/", clientCredential:credential);

  if (result == null) {
    throw new InvalidOperationException("Failed to obtain the JWT token");
  }

  string token = result.AccessToken;

  return token;
}
```
Microsoft also offers a direct Service Management API for interacting with most Azure items. Here you can authenticate using Azure Active Directory or X.509 certificates.

Request and response shape

The Resource Manager API appears RESTful and works with JSON messages. In order to retrieve the details about a specific virtual machine, you send a request to:
```
http://maagement.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Compute/virtualMachines/{vm-name}?api-version={api-version
```
The response JSON is fairly basic, and doesn’t tell you much about related services (e.g. networks or load balancers).
```
{
   "id":"/subscriptions/########-####-####-####-############/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{virtualMachineName}",
   "name":"virtualMachineName”,
  "   type":"Microsoft.Compute/virtualMachines",
   "location":"westus",
   "tags":{
      "department":"finance"
   },
   "properties":{
      "availabilitySet":{
         "id":"/subscriptions/########-####-####-####-############/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/availabilitySets/{availabilitySetName}"
      },
      "hardwareProfile":{
         "vmSize":"Standard_A0"
      },
      "storageProfile":{
         "imageReference":{
            "publisher":"MicrosoftWindowsServerEssentials",
            "offer":"WindowsServerEssentials",
            "sku":"WindowsServerEssentials",
            "version":"1.0.131018"
         },
         "osDisk":{
            "osType":"Windows",
            "name":"osName-osDisk",
            "vhd":{
               "uri":"http://storageAccount.blob.core.windows.net/vhds/osDisk.vhd"
            },
            "caching":"ReadWrite",
            "createOption":"FromImage"
         },
         "dataDisks":[

         ]
      },
      "osProfile":{
         "computerName":"virtualMachineName",
         "adminUsername":"username",
         "adminPassword":"password",
         "customData":"",
         "windowsConfiguration":{
            "provisionVMAgent":true,
            "winRM": {
               "listeners":[{
               "protocol": "https",
               "certificateUrl": "[parameters('certificateUrl')]"
               }]
            },
            “additionalUnattendContent”:[
               {
                  “pass”:“oobesystem”,
                  “component”:“Microsoft-Windows-Shell-Setup”,
                  “settingName”:“FirstLogonCommands|AutoLogon”,
                  “content”:“<XML unattend content>”
               }               "enableAutomaticUpdates":true
            },
            "secrets":[

            ]
         },
         "networkProfile":{
            "networkInterfaces":[
               {
                  "id":"/subscriptions/########-####-####-####-############/resourceGroups/CloudDep/providers/Microsoft.Network/networkInterfaces/myNic"
               }
            ]
         },
         "provisioningState":"succeeded"
      }
   }
```
The Service Management API is a bit different. It’s also RESTful, but works with XML messages (although some of the other services like Autoscale seem to work with JSON). If you wanted to create a VM deployment, you’d send an HTTP POST request to:
```
https://management.core.windows.net/<subscription-id>/services/hostedservices/<cloudservice-name>/deployments
```
The result is an extremely verbose XML payload.

Breadth of services

In addition to an API for virtual machine management, Microsoft has REST APIs for virtual networks, load balancers, Traffic Manager, DNS, and more. The Service Management API appears to have a lot more functionality than the Resource Manager API.

Microsoft is stuck with a two portal user environment where the officially supported one (at https://manage.windowsazure.com) has different features and functions than the beta one (https://portal.azure.com). It’s been like this for quite a while, and hopefully they cut over to the new one soon.

SDKs, tools, and documentation

Microsoft provides lots of options on their SDK page. Developers can interact with the Azure API using .NET, Java, Node.js, PHP, Python, Ruby, and Mobile (iOS, Android, Windows Phone), and it appears that each one uses the Service Management APIs to interact with virtual machines. Frankly, the documentation around this is a bit confusing. The documentation about the virtual machines service is ok, and provides a handful of walkthroughs to get you started.

The core API documentation exists for both the Service Management API, and the Azure Resource Manager API. For each set of documentation, you can view details of each API call. I’m not a fan of the the navigation in Microsoft API docs. It’s not easy to see the breadth of API operations as the focus is on a single service at a time.

Microsoft has a lot of support for virtual machines in the ecosystem, and touts integration with Chef, Ansible, and Docker,

Unique attributes

Besides being a little confusing (which APIs to use), the Azure API is pretty comprehensive (on the Service Management side). Somewhat uniquely, the Resource Manager API has a (beta) billing API with data about consumption and pricing. While I’ve complained a bit here about Resource Manager and conflicting APIs, it’s actually a pretty useful thing. Developers can use the resource manager concept (and APIs) to group related resources and deliver access control and templating.

Also, Microsoft bakes in support for Azure virtual machines in products like Azure Site Recovery.

Summary

The common thing you see across most cloud APIs is that they provide solid coverage of the features the user can do in the vendor’s graphical UI. We also saw that more and more attention is being paid to SDKs and documentation to help developers get up and running. AWS has been in the market the longest, so you see maturity and breadth in their API, but also a heavier interface (authentication, XML payloads). CenturyLink and Google have good account management APIs, and Azure’s billing API is a welcome addition to their portfolio. Amazon, CenturyLink, and Google have fairly verbose API responses, and CenturyLink is the only one with a hypermedia approach of linking to related resources. Microsoft has a messier API story than I would have expected, and developers will be better off using SDKs!

What do you think? Do you use the native APIs of cloud providers, or prefer to go through SDKs or brokers?
August 3, 2015
Recent Presentations (With Video!): Transitioning to PaaS and DevOps

I just finished up speaking at a run of conferences (Cloud Foundry Summit, ALM Forum, and the OpenStack Summit) and the recorded presentations are all online.

The Cloud Foundry Summit was excellent and you can find all the conference videos on the Cloud Foundry Youtube channel. I encourage you to watch some of the (brief) presentations by customers to hear now real people use application platforms to solve problems. My presentation (slides here) was about the enterprise transition to PaaS, and what large companies need to think about when introducing PaaS to their environment.

At the ALM Forum and OpenStack Summit, I talked about the journey to a DevOps organization and used CenturyLink as a case study. I went through our core values, logistics (org chart, office layout, tool set), and then a week-in-the-life that highlighted the various meetings and activities we do.

Companies making either journey will realize significant benefits, but not without proper planning. PaaS can be supremely disruptive to how applications are being delivered today, and DevOps may represent a fundamental shift in how technology services are positioned and prioritized at your company. While not directly related, PaaS and DevOps both address the emerging desire to treat I.T. as a competitive advantage.

Enjoy!

May 27, 2015
Using Multiple NoSQL Database Models with Orchestrate and Node.js
Databases aren’t a solved problem. Dozens of new options have sprouted up over the past five years as developers look for ways to effectively handle emerging application patterns. But how do you choose which one to use, especially when your data demands might span the competencies of an individual database engine?

NoSQL choices abound. Do you need something that stores key-value information for quick access? How about stashing JSON documents that represent fluid data structures? What should you do for Internet-of-Things scenarios or fast moving log data where “time” is a first class citizen? How should you handle dynamic relationships that require a social graph approach? Where should you store and use geo-spatial data? And don’t forget about the need to search all that data! If you need all of the aforementioned characteristics, then you typically need to stand up and independently manage multiple database technologies and weave them together at the app layer. Orchestrate.io does it differently, and yesterday, CenturyLink acquired them.

What does Orchestrate do? It’s a fast, hosted, managed, multi-model database fabric that is accessible through a single REST API. Developers only pay for API calls, and have access to a flexible key-value store that works with time-ordered events, geospatial data, and graph relationships. It runs an impressive tech stack under the covers, but all that the developers have to deal with is the web interface and API layer. In this blog post, I’ll walk through a simple sample app that I built in Node.js.

First off, go sign up for an Orchestrate account that comes with a nice free tier. Do it, now, I’ll wait.

Once in the Orchestrate management dashboard, I created an application. This is really just a container for data collections for a given deployment region. In my case, I chose one of the four brand new CenturyLink regions that we lit up last night. Psst, it’s faster on CenturyLink Cloud than on AWS.

For a given app, I get an API key used to authenticate myself in code. Let’s get to work (note that you can find my full project in this GitHub repo). I built a Node.js app, but you can use any one of their SDKs (Node, Ruby, Python, Java and Go with community-built options for .NET and PHP) or, their native API. To use the Node SDK, I added the package via npm.

In this app, I’ll store basketball player profiles, relationships between players, and a time-ordered game log. I spun up a Node Express project (using Visual Studio, which I’m getting used to), and added some code to “warm up” my collection by adding some records, some event data, and some relationships. You can also query/add/update/delete via the Orchestrate management dashboard, but this was a more repeatable choice.
```
var express = require('express');
var router = express.Router();

//Orchestrate API token
var token = "<key>";
var orchestrate = require('orchestrate');
//location reference
orchestrate.ApiEndPoint = "<endpoint>";
var db = orchestrate(token);

/* Warmup the collection. */
router.get('/', function (req, res) {

    //create key/value records
    db.put('Players', "10001", {
        "name": "Blake Griffin",
        "team": "Los Angeles Clippers",
        "position": "power forward",
        "birthdate": "03/16/89",
        "college": "Oklahoma",
        "careerppg": "21.5",
        "careerrpg": "9.7"
    })
    .then(function (result1) {
        console.log("record added");
    });

    //create key/value records
    db.put('Players', "10002", {
        "name": "DeAndre Jordan",
        "team": "Los Angeles Clippers",
        "position": "center",
        "birthdate": "07/21/88",
        "college": "Texas A&M",
        "careerppg": "8.0",
        "careerrpg": "9.0"
    })
    .then(function (result2) {
        console.log("record added");
    });

    //create key/value records
    db.put('Players', "10003", {
        "name": "Matt Barnes",
        "team": "Los Angeles Clippers",
        "position": "strong forward",
        "birthdate": "03/09/80",
        "college": "UCLA",
        "careerppg": "8.1",
        "careerrpg": "4.5",
        "teams": [
            "Los Angeles Clippers",
            "Sacramento Kings",
            "Golden State Warriors"
        ]
    })
    .then(function (result3) {
        console.log("record added");
    });

    //create event
    db.newEventBuilder()
    .from('Players', '10001')
    .type('gamelog')
    .time(1429531200)
    .data({ "opponent": "San Antonio Spurs", "minutes": "43", "points": "26", "rebounds": "12" })
    .create()
    .then(function (result4) {
        console.log("event added");
    });

    //create event
    db.newEventBuilder()
    .from('Players', '10001')
    .type('gamelog')
    .time(1429012800)
    .data({ "opponent": "Phoenix Suns", "minutes": "29", "points": "20", "rebounds": "8" })
    .create()
    .then(function (result5) {
        console.log("event added");
    });

    //create graph relationship
    db.newGraphBuilder()
    .create()
    .from('Players', '10001')
    .related('currentteammate')
    .to('Players', '10002')
    .then(function (result6) {
        console.log("graph item added");
    });

    //create graph relationship
    db.newGraphBuilder()
    .create()
    .from('Players', '10001')
    .related('currentteammate')
    .to('Players', '10003')
    .then(function (result7) {
        console.log("graph item added");
    });

    res.send('warmed up');
});

module.exports = router;
```
After running the app and hitting the endpoint, I saw a new collection (“Players”) created in Orchestrate.

It’s always nice to confirm that things worked, so I switched to the Orchestrate UI where I queried my key/value store for one of the keys I added above. Sure enough, the item for Los Angeles Clippers superstar Blake Griffin comes back. Success.

Querying data items from code is super easy. Retrieve all the players in the collection? Couple lines of code.
```
/* GET all player listing. */
router.get('/', function (req, res) {

    db.list("Players")
    .then(function (result) {
        playerlist = result.body.results;
        //console.log(playerlist);

        res.render('players', { title: 'Players List', plist: playerlist });
    })
    .fail(function (err) {
        console.log(err);
    });
});
```
When the app runs, I get a list of players. Thanks to my Jade template, I’ve also got a hyperlink to the “details” page.

The player details include their full record info, any relationships to other players, and a log of games played. As you can see in the code below, it’s extremely straightforward to pull each of these entity types.
```
/* GET one player profile. */
router.get('/:playerid', function (req, res) {

    console.log(req.params.playerid);

    //get player object
    db.get("Players", req.params.playerid)
    .then(function (result) {
        player = result.body;
        //console.log(player);

        //get graph of relationships
        db.newGraphReader()
        .get()
        .from('Players', req.params.playerid)
        .related('currentteammate')
        .then(function (relres) {
            playerlist = relres.body;
            //console.log(playerlist);

            //get time-series events
            db.newEventReader()
            .from('Players', req.params.playerid)
            .type('gamelog')
            .list()
            .then(function (evtres) {
                gamelog = evtres.body;
                //console.log(gamelog);

                res.render('player', { title: 'Player Profile', profile: player, plist: playerlist, elist: gamelog });
            });
        });
    })
    .fail(function (err) {
        console.log(err);
    });
});
```
I didn’t bake in any true “search” capability into my app, but it’s one of the most powerful parts of the database service. Devs search through collections with a Lucene-powered engine with a robust set of operators. Do fuzzy search, deep full-text search of nested objects, grouping, proximity search, geo queries, aggregations, complex sorting, and more. Which player ever played for the Sacramento Kings? It’s easy to search nested objects.

Summary

I don’t usually flog my own company’s technology on this blog, but this is fun, important stuff. Developers are faced with a growing set of technologies they need to know in order to build efficient mobile experiences, data-intensive systems, and scalable cloud apps. Something like Orchestrate flips the script by giving developers a quick on-ramp to a suite of database engines that solve multiple problems.

Take it for a spin, and give me feedback for where we should go with it next!
April 21, 2015
Node.js and Visual Studio: From Zero to a Cloud Foundry Deployment in 2 Minutes

Microsoft just released the 1.0 version of their Node.js Tools for Visual Studio. This gives Windows developers a pretty great environment for building clean, feature rich Node.js applications. It’s easy to take an application built in Visual Studio and deploy to Microsoft’s own cloud, so how about pushing apps elsewhere? In this post, I’ll show you how quick and easy it is to set up a Node.js application in Visual Studio, and push to a Cloud Foundry endpoint.

As a prerequisite, take 30 seconds to download and install the Cloud Foundry CLI, and sign up for a friendly Cloud Foundry provider. And, install the new Node.js Tools for Visual Studio.

Creating and Deploying an Express Application

Let’s start with the lightning round, and then I’ll go back and show off some of the features of this toolkit.

Step 1 – Open Visual Studio and create the Node.js Express project (25 seconds)

The Node.js Tools for Visual Studio add a few things to the Visual Studio experience. One of those is the option to select Node project types. This makes it super easy to spin up an example Express application.

The sample project selected above includes a package.json file with all the dependencies. Visual Studio goes out and reconciles them all by downloading the packages via npm.

The project loads up quickly, and you can see a standard Express skeleton project. Note that Visual Studio doesn’t throw any useless cruft into the solution. It’s all just basic Node stuff. That’s nice.

Step 2 – Open Command Prompt and target the AppFog v2 environment (30 seconds)

This Express app is boring, but technically a complete, deployable project. Let’s do that. I didn’t see an obvious way to get my Cloud Foundry-aware command line within the Visual Studio shell itself, but it’s crazy easy to open one that’s pointed at our project directory. Right-clicking the Visual Studio project gives the option to open a command prompt at the root of the project.

Before deploying, I made sure I had the credentials and target API for my Cloud Foundry endpoint. In this case, I’m using the not-yet-released AppFog v2 from CenturyLink. “Sneak peek alert!”

After logging into my endpoint via the Cloud Foundry CLI, I was ready to push the app.

Step 3 – Push application (81 seconds)

With a simple “cf push” command, I was off and running. App deployment times vary based on project size and complexity. If I had deleted (or excluded) all the local packages, then Cloud Foundry would have simply downloaded them all server-side, thus accelerating the upload. I was lazy, and just sent all my files up to the platform fabric directly.

After a short time, my app is staged, deployed, and started.

Step 4 – Verify!

The best part of any demo: seeing the results!

You don’t really need to know anything about Node (or Cloud Foundry!) to test this out. It’s never been easier to sample new technology.

Exploring Additional Node Capabilities in Visual Studio

Let’s retrace our steps a bit and see what else these Node tools put into Visual Studio. First, you’ll see a host of options when creating a new Visual Studio project. There are empty Node.js projects (that pretty much just include a package.json and app.js file), Express applications, and even Azure-flavored ones.

The toolkit includes some Intellisense for Node.js developers which is nice. In the case below, it knows what methods are available on common object types.

For installing packages, there are two main options. First, you can use the Node.js Interactive Window to issue commands. One such command is .npm and I can install anything from the global directory. Such as the awesome socket.io package.

One downside of using this mechanism is that the package.json file isn’t automatically updated. However, Visual Studio helpfully reminds you of that.

The “preferred” way to install packages seems to be the GUI where you can easily browse and select your chosen package. There’s options here to choose a version, pick a dependency type, and add to the package.json file. Pretty handy.

The Toolkit also makes it easy to quickly spin up and debug an app. Press F5 starts up the Node server and opens a browser window that points at that server and relevant port.

Summary

I’ve used a few different Node.js development environments on Windows, and Visual Studio is quickly becoming a top-tier choice. If you’re just starting out with Node, or a seasoned developer, Visual Studio seems to have the nice mix of helpful capabilities while not getting in the way too much.

March 27, 2015
Comparing Clouds: “Day 2” Management Operations
So far in this blog series, I’ve taken a look at how to provision and scale servers using five leading cloud providers. Now, I want to dig into support for “Day 2 operations” like troubleshooting, reactive or proactive maintenance, billing, backup/restore, auditing, and more. In this blog post, we’ll look at how to manage (long-lived) running instances at each provider and see what capabilities exist to help teams manage at scale. For each provider, I’ll assess instance management, fleet management, and account management.

There might be a few reasons you don’t care a lot about the native operational support capabilities in your cloud of choice. For instance:
- You rely on configuration management solutions for steady-state. Fair enough. If your organization relies on great tools like Ansible, Chef or CFEngine, then you already have a consistent way to manage a fleet of servers and avoid configuration drift.
- You use “immutable servers.” In this model, you never worry about patching or updating running machines. Whenever something has to change, you deploy a new instance of a gold image. This simplifies many aspects of cloud management.
- You leverage “managed” servers in the cloud. If you work with a provider that manages your cloud servers for you, then on the surface, there is less need for access to robust management services.
- You’re running a small fleet of servers. If you only have a dozen or so cloud servers, then management may not be the most important thing on your mind.
- You leverage a multi-cloud management tool. As companies chase the “multi-cloud” dream, they leverage tools like RightScale, vRealize, and others to provide a single experience across a cloud portfolio.
However, I contend that the built-in operational capabilities of a particular cloud are still relevant for a variety of reasons, including:
- Deployments and upgrades. It’s wonderful if you use a continuous deployment tool to publish application changes, but cloud capabilities still come into play. How do you open up access cloud servers and push code to them? Can you disable operational alarms while servers are in an upgrading state? Is it easy to snapshot a machine, perform an update, and roll back if necessary? There’s no one way to do application deployments, so your cloud environment’s feature set may still play an important role.
- Urgent operational issues. Experiencing a distributed denial of service attack? Need to push an urgent patch to one hundred servers? Trying to resolve a performance issue with a single machine? Automation and visibility provided by the cloud vendor can help.
- Handle steady and rapid scale. There’s a good chance that your cloud footprint is growing. More environments, more instances, more scenarios. How does your cloud make it straightforward to isolate cloud instances by function or geography? A proper configuration management tool goes a long way to making this possible, but cloud-native functionality will be important as well.
- Audit trails. Users may interact with the cloud platform via a native UI, third party UI, or API. Unless you have a robust log aggregation solution that pulls data from each system that fronts the cloud, it’s useful to have the system of record (usually the cloud itself) capture information centrally.
- UI as a window to the API. Many cloud consumers don’t ever see the user interface provided by the cloud vendor. Rather, they only use the available API to provision and manage cloud resources. We’ll look at each cloud provider’s API in a future post, but the user interface often reveals the feature set exposed by the API. Even if you are an API-only user, seeing how the Operations experience is put together in a user interface can help you see how the vendor approaches operational stories.
Let’s get going in alphabetical order.

DISCLAIMER: I’m the product owner for the CenturyLink Cloud. Obviously my perspective is colored by that. However, I’ve taught three well-received courses on AWS, use Microsoft Azure often as part of my Microsoft MVP status, and spend my day studying the cloud market and playing with cloud technology. While I’m not unbiased, I’m also realistic and can recognize strengths and weaknesses of many vendors in the space.

Amazon Web Services

Instance Management

Users can do a lot of things with each particular AWS instance. I can create copies (“Launch more like this”), convert to a template, issue power operations, set and apply tags, and much more.

AWS has a super-rich monitoring system called CloudWatch that captures all sorts of metrics and capable of sending alarms.

Fleet Management

AWS shows all your servers in a flat, paging, list.

You can filter the list based on tag/attribute/keyword associated with the server(s). Amazon also JUST announced Resource Grouping to make it easier to organize assets.

When you’ve selected a set of servers in the list, you can do things like issue power operations in bulk.

Monitoring also works this way. However, Autoscale does not work against collections of servers.

It’d be negligent of me to talk about management at scale in AWS without talking about Elastic Beanstalk and OpsWorks. Beanstalk puts an AWS-specific wrapper around an “application” that may be comprised on multiple individual servers. A Beanstalk application may have a load balancer, and be part of an Autoscaling group. It’s also a construct for doing rolling deployments. Once a Beanstalk app is up and running, the user can manage the fleet as a unit.

Once you have a Beanstalk application, you can terminate and restart the entire environment.

There are still individual servers shown in the EC2 console, but Beanstalk makes it simpler to manage related assets.

OpsWorks is a relatively new offering used to define and deploy “stacks” comprised of application layers. Developers can associate Chef recipes to multiple stages of the lifecycle. You can also run recipes manually at any time.

Account Management

AWS doesn’t offer any “aggregate” views that roll up your consumption across all regions. The dashboards are service specific, and are shown on a region-by-region basis. AWS accounts are autonomous, and you don’t share anything between them. Within an account, user can do a lot of things. For instance, the Identity and Access Management service lets you define customized groups of users with very specific permission sets.

AWS has also gotten better at showing detailed usage reports.

The invoice details are still a bit generic and don’t easily tie back to a given server.

There are a host of other AWS services that make account management easier. These include CloudTrail for API audit logs and SNS for push notifications.

CenturyLink Cloud

Instance Management

For an individual virtual server in CenturyLink Cloud, the user has a lot of management options. It’s pretty easy to resize, clone, archive, and issue power commands.

Doing a deployment but want to be able to revert any changes? The platform supports virtual machine snapshots for creating restore points.

Each server details page shows a few monitoring metrics.

Users can also bind usage alert and vertical autoscale policies to a server.

Fleet Management

CenturyLink Cloud has you organize servers into collections called “Groups.” These Groups – which behave similarly to a nested file structure – are management units.

Users can issue bulk power operations against all or some of the servers in a Group. Additionally, you can set “scheduled tasks” on a Group. For instance, power off all the servers in a Group every Friday night, and turn them back on Monday morning.

You can also choose pre-loaded or dynamic actions to perform against the servers in a Group. These packages could be software (e.g. new antivirus client) or scripts (e.g. shut off a firewall port) that run against any or all of the servers at once.

The CenturyLink Cloud also provides an aggregated view across data centers. In this view, it’s fairly straightforward to see active alarms (notice the red on the offending server, group, and data center), and navigate the fleet of resources.

Finally, the platform offers a “Global Search” where users can search for servers located in any data center.

Account Management

Within CenturyLink Cloud, there’s a concept of an account hierarchy. Accounts can be nested within one another. Networks and other settings can be inherited (or separated), and user permissions cascade down.

Throughout the system, users can see the month-to-date and projected cost of their cloud consumption. The invoice data itself shows costs on a per server, and per Group basis. This is handy for chargeback situations where teams pay for specific servers or entire environments.

CenturyLink Cloud offers role-based access controls for a variety of personas. These apply to a given account, and any sub-accounts beneath it.

The CenturyLink Cloud has other account administration features like push-based notifications (“webhooks”) and a comprehensive audit trail.

Digital Ocean

Instance Management

Digital Ocean specializes in simplicity targeted at developers, but their experience is still serves up a nice feature set. From the server view, you can issue power operations, resize the machine, create snapshots, change the server name, and more.

There are a host of editable settings that touch on networking, Linux Kernel, and recovery processes.

Digital Ocean gives developers a handful of metrics that clearly show bandwidth consumption and resource utilization.

There’s a handy audit trail below each server that clearly identifies what operations were performed and how long they took.

Fleet Management

Digital Ocean focuses on the developer audience and API users. Their UI console doesn’t really have a concept of managing a fleet of servers. There’s no option to select multiple servers, sort columns, or perform bulk activities.

Account Management

The account management experience is fairly lightweight at Digital Ocean. You can view account resources like snapshots and backups.

It’s easy to create new SSH keys for accessing servers.

The invoice experience is simple but clear. You can see current charges, and how much each individual server cost.

The account history shows a simple audit trail.

Google Compute Engine

Instance Management

The Google Compute Engine offers a nice amount of per-server management options. You can connect to a server via SSH, reboot it, clone it, and delete it. There are also a set of monitoring statistics clearly shown at the top of each server’s details.

Additionally, you can change settings for storage, network, and tags.

Fleet Management

The only thing you really do with a set of Google Compute Engine servers is delete them.

Google Compute Engine offers Instance groups for organizing virtual resources. They can all be based on the same template and work together in an autoscale fashion, or, you can put different types of servers into an instance group.

An instance group is really just a simple construct. You don’t manage the items as a group, and if you delete the group, the servers remain. It’s simply a way to organize assets.

Account Management

Google Compute Engine offers a few different types of management roles including owner, editor, and viewer.

What’s nice is that you can also have separate billing managers. Other billing capabilities include downloading usage history, and reviewing fairly detailed invoices.

I don’t yet see an audit trail capability, so I assume that you have to track activities some other way.

Microsoft Azure

Instance Management

Microsoft is in transition between its legacy, production portal, and it’s new blade-oriented portal. For the classic portal, Microsoft crams a lot of useful details into each server’s “details” page.

The preview portal provides even more information, in a more … unique … format.

In either environment, Azure makes it easy to add disks, change virtual machine size, and issue power ops.

Microsoft gives users a useful set of monitoring metrics on each server.

Unlike the classic portal, the new one has better cost transparency.

Fleet Management

There are no bulk actions in the existing portal, besides filtering which Azure subscription to show, and sorting columns. Like AWS, Azure shows a flat list of servers in your account.

The preview portal has the same experience, but without any column sorting.

Account Management

Microsoft Azure users have a wide array of account settings to work with. It’s easy to see current consumption and how close to the limits you are.

The management service gives you an audit log.

New portal gives users the ability to set a handful of account roles for each server. I don’t see a way to apply these roles globally, but it’s a start!

The pricing information is better in the preview portal, although the costs are still fairly coarse and not at a per-machine basis.

Summary

Each of these providers has a very unique take on server management. Whether your virtual servers typically live for three hours or three years, the provider’s management capabilities will come into play. Think about what your development and operations staff need to be successful, and take an active role in planning how Day 2 operations in your cloud will work. Consider things like bulk management, audit trails, and security controls when crafting your strategy!
December 18, 2014
Comparing Clouds : IaaS Scalability Options

In my first post of this series, I looked at the provisioning experience of five leading cloud Infrastructure-as-a-Service providers. No two were alike, as each offered a unique take.

Elasticity is an oft-cited reason for using the cloud, so scalability is a key way to assess the suitability of a given cloud to your workloads. Like before, I’ll assess Google Compute Engine, Microsoft Azure, AWS, CenturyLink Cloud, and Digital Ocean. Each cloud will be evaluated based on the ability to scale vertically (i.e. add/remove instance capacity) and horizontally (i.e. add/remove instances) either manually or automatically.

Let’s get going in alphabetical order.

DISCLAIMER: I’m the product owner for the CenturyLink Cloud. Obviously my perspective is colored by that. However, I’ve taught three well-received courses on AWS, use Microsoft Azure often as part of my Microsoft MVP status, and spend my day studying the cloud market and playing with cloud technology. While I’m not unbiased, I’m also realistic and can recognize strengths and weaknesses of many vendors in the space.

Amazon Web Services

How do you scale vertically?

In reality, AWS treats individual virtual servers as immutable. There are some complex resizing rules, and local storage cannot be resized at any time. Resizing an AWS image also results in all new public and private IP addresses. Honestly, you’re really building a new server when you choose to resize.

If you want to add CPU/memory capacity to a running virtual machine – and you’re not trying to resize to an instance type of a different virtualization type – then you must stop it first. You cannot resize instances between different virtualization types, so you may want to carefully plan for this. Note that stopping an AWS VM means that anything on the ephemeral storage is destroyed.

Once the VM is stopped, it’s easy to switch to a new instance type. Note that you have to be familiar with the instance types (e.g. size and cost) as you aren’t given any visual indicator of what you’re signing up for. Once you choose a new instance type, simply start up the instance.

Want to add storage to an existing AWS instance? You don’t do that from the “instances” view in their Console, but instead, create an EBS volume separately and attach it later.

Attaching is easy, but you do have to remember your instance name.

By changing instance type, and adding EBS volumes, teams can vertically scale their resources.

How do you scale horizontally?

AWS strongly encourages customers to build horizontally-scalable apps, and their rich Auto Scaling service supports that. Auto Scaling works by adding (or removing) virtual resources from a pool based on policies.

When creating an Auto Scaling policy, you first choose the machine image profile (the instance type and template to add to the Auto Scale group), and then define the Auto Scale group. These details include which availability zone(s) to add servers to, how many servers to start with, and which load balancer pool to use.

With those details in place, the user then sets up the scaling policy (if they wish) which controls when to scale out and when to scale in. One can use Auto Scale to keep the group at a fixed size (and turn up instances if one goes away), or keep the pool size fluid based on usage metrics or schedule.

Amazon has a very nice horizontal scaling solution that works automatically, or manually. Users are free to set up infrastructure Auto Scale groups, or, use AWS-only services like Elastic Beanstalk to wrap up Auto Scale in an application-centric package.

CenturyLink Cloud

How do you scale vertically?

CenturyLink Cloud offers a few ways to add new capacity to existing virtual servers.

First off, users can resize running servers by adding/removing vCPUs and memory, and growing storage. When adding capacity, the new resources are typically added without requiring a power cycle on the server and there’s no data loss associated with a server resize. Also, note that when you look at dialing resources up and down, the projected impact on cost is reflected.

Users add more storage to a given server by resizing any existing drives (including root) and by adding entirely new volumes.

If the cloud workload has spiky CPU consumption, then the user can set up a vertical Autoscale policy that adds and removes CPU capacity. When creating these per-server policies, users choose a CPU min/max range, how long to collect metrics before scaling, and how long to wait before another scale event (“cool down period”). Because scaling down (removing vCPUs) requires a reboot, the user is asked for a time window when it’s ok to cycle the server.

How do you scale horizontally?

Like any cloud, CenturyLink Cloud makes it easy to manually add new servers to a fleet. Over the summer, CenturyLink added a Horizontal Autoscale service that powers servers on and off based on CPU and memory consumption thresholds. These policies – defined once and available in any region – call out minimum sizing, monitoring period threshold, cool down period, scale out increment, scale in increment, and CPU/RAM utilization thresholds.

Unlike other public clouds, CenturyLink organizes servers by “groups.” Horizontal Autoscale policies are applied at the Group level, and are bound to a load balancer pool when applied. When a scale event occurs, the servers are powered on and off within seconds. Parked servers only incur cost for storage and OS licensing (if applicable), but there still is a cost to this model that doesn’t exist in the AWS-like model of instantiating and tearing down servers each time.

CenturyLink Cloud provides a few ways to quickly scale vertically (manually or automatically without rebooting), and now, horizontally. While the autoscaling capability isn’t as feature-rich as what AWS offers, the platform recognizes the fact that workloads have different scale vectors and benefit from capacity being added up or out.

Digital Ocean

How do you scale vertically?

Digital Ocean offers a pair of ways to scale a droplet (virtual instance).

First, users can do a “Fast-Resize” which quickly increases or decreases CPU and memory. A droplet must be powered off to resize.

After shutting the droplet down and choosing a new droplet size, the additional capacity is added in seconds.

Once a droplet is sized up, it’s easy to (power off) and size down again.

If you want to change your disk size as well, Digital Ocean offers a “Migrate-Resize” model where you first take a snapshot of your (powered off) droplet.

Then, you create an entirely new droplet, but choose that snapshot as the “base.” This way, you end up with a new (larger) droplet with all the data from the original one.

How do you scale horizontally?

You do it manually. There are no automated techniques for adding more machines when a usage threshold is exceeded. They do tout their API as a way to detect scale conditions and quickly clone droplets to add more to a running fleet.

Digital Ocean is known for its ease, performance, and simplicity. There isn’t the level of sophistication and automation you find elsewhere, but the scaling experience is very straightforward.

Google Compute Engine

How do you scale vertically?

Google lets you add more storage to a running virtual machine. Persistent disks can be shared among many machines, although only one machine at a time can have read/write permission.

Interestingly, Google Compute Engine doesn’t support an upgrade/downgrade to different instance types, so there’s no way to add/remove CPU or memory from a machine. They recommend creating a new virtual machine and attaching the persistent disks from the original one. So, “more storage” is the only vertical scaling capability currently offered here.

How do you scale horizontally?

Up until a week ago, Google didn’t have an auto scaling solution. That changed, and now the Compute Engine Autoscaler is in beta.

First, you need to set up an instance template for use by the Autoscaler. This is the same data you provide when creating an actual running instance. In this case, it’s template-ized for future use.

Then, create an instance group that lets you collectively manage a group of resources. Here’s the view of it, before I chose to set “Autoscaling” to “On.”

Turning Autoscaling on results in new settings popping up. Specifically, the autoscale trigger (choices: CPU usage, HTTP load balancer usage, monitoring metric), the usage threshold, instance min/max, and cool-down period.

You can use this with HTTP or network load balanced instance groups to load balance multiple app tiers independently.

Google doesn’t offer much in the way of vertical resizing, but the horizontal auto scaling story is quickly catching up to the rest.

Microsoft Azure

How do you scale vertically?

Microsoft provides a handful of vertical scaling options. For a virtual server instance, a user can change the instance type in order to get more/less CPU and memory. It appears from my testing that this typically requires a reboot of the server.

Azure users can also add new, empty disks to a given server. It doesn’t appear as if you can resize existing disks.

How do you scale horizontally?

Microsoft, like all clouds, makes it easy to add more virtual instances manually. They also have a horizontal auto scale capability. First, you must put servers into an “availability set” together. This is accomplished by first putting them into the same “cloud service” in Azure. In the screenshot below, seroterscale is the name of my cloud service, and both the two instances are part of the same availability set.

Somewhat annoyingly, all these machines have to be the exact same size (which is the requirement in some other clouds too, minus CenturyLink). So after I resized my second server, I was able to muck with the auto scale settings. Note that Azure auto scale also works by enabling/disabling existing virtual instances versus creating or destroying instances.

Notice that you have two choices. First, you can scale based on scheduled time.

Either by schedule or by metric, you specify how many instances to turn on/off based on the upper/lower CPU threshold. It’s also possible to scale based on the queue depth of a Service Bus queue.

Microsoft gives you a few good options for bumping up the resources on existing machines, while also enabling more servers in the fleet to offset planned or unplanned demand.

Summary

As with my assessment of cloud provisioning experiences, each cloud provider’s scaling story mirrors their view of the world. Amazon has a broad, sophisticated, and complex feature set, and their manual and Auto Scaling capabilities reflects that. CenturyLink Cloud focuses on greenfield and legacy workloads, and thus has a scaling story that’s focused on supporting both modern scale-out systems as well as traditional systems that prefer to scale up. Digital Ocean is all about fast acquisition of resources and an API centric management story, and their basic scaling options demonstrate that. Google focuses a lot on quickly getting lots of immutable resources, and their limited vertical scaling shows that. Their new horizontal scaling service complements their perspective. Finally, Microsoft’s experience for vertical scaling mirrors AWS, while their horizontal scaling is a bit complicated, but functional.

Unless you’re only working with modern applications, it’s likely your scaling needs will differ by application. Hopefully this look across providers gave you a sense for the different capabilities out there, and what you might want to keep in mind when designing your systems!

November 20, 2014
Comparing Clouds: IaaS Provisioning Experience

There is no perfect cloud platform. Shocking, I know. Organizations choose the cloud that best fits their values and needs. Many factors go into those choices, and it can depend on who is evaluating the options. A CIO may care most about the vendor’s total product portfolio, strategic direction, and ability to fit into the organization’s IT environment. A developer may look at which cloud offers the ability to compose and deploy the most scalable, feature-rich applications. An Ops engineer may care about which cloud gives them the best way to design and manage a robust, durable environment. In this series of blogs posts, I’m going to look at five leading cloud platforms (Microsoft Azure, Google Compute Engine, AWS, Digital Ocean, and CenturyLink Cloud) and briefly assess the experience they offer to those building and managing their cloud portfolio. In this first post, I’ll flex the infrastructure provisioning experience of each provider.

DISCLAIMER: I’m the product owner for the CenturyLink Cloud. Obviously my perspective is colored by that. However, I’ve taught three well-received courses on AWS, use Microsoft Azure often as part of my Microsoft MVP status, and spend my day studying the cloud market and playing with cloud technology. While I’m not unbiased, I’m also realistic and can recognize strengths and weaknesses of many vendors in the space.

I’m going to assess each vendor across three major criteria: how do you provision resources, what key options are available, and what stands out in the experience.

Microsoft Azure

Microsoft added an IaaS service last year. Their portfolio of cloud services is impressive as they continue to add unique capabilities.

How do you provision resources?

Nearly all Azure resources are provisioned from the same Portal (except for a few new services that are only available in their next generation Preview Portal). Servers can be built via API as well. Users can select from a range of Windows and Linux templates (but no Red Hat Linux). Microsoft also offers some templates loaded with Microsoft software like SharePoint, Dynamics, and BizTalk Server.

When building a server, users can set the server’s name and select from a handful of pre-defined instance sizes.

Finally, the user sets the virtual machine configuration attributes and access ports.

What key options are available?

Microsoft makes it fairly easy to reference to custom-built virtual machine image templates when building new servers.

Microsoft lets you set up or reference a “cloud service” in order to set up a load balanced pool

Finally, there’s an option to spread the server across fault domains via “availability sets” and set up ports for public access.

What stands out?

Microsoft offers a “Quick Create” option where users can spin up VMs by just providing a couple basic values.

Lots of VM instance sizes, no sense of the cost while you’re walking through the provisioning process.

Developers can choose from any open source image hosted in the VM Depot. This gives users a fairly easy way to deploy a variety of open source platforms onto Azure.

Google Compute Engine

Google also added an IaaS product to their portfolio last year. They don’t appear to be investing much in the UI experience, but their commitment to fast acquisition of robust servers is undeniable.

How do you provision resources?

Servers are provisioned from the same console used to deploy most any Google cloud service. Of course, you can also provision servers via the REST API.

By default, users see a basic server provisioning page.

The user chooses a location for their server, what instance size to use, the base OS image, which network to join, and whether to provide a public IP address.

What key options are available?

Google lets you pick your boot disk (standard or SSD type).

Users have the choice of a few “availability options.” This includes an automatic VM restart for non-user initiated actions (e.g. hardware failure), and the choice to migrate or terminate VMs when host maintenance occurs.

Google let’s you choose which other Google services you can access from a cloud VM.

What stands out?

Google does a nice job of letting you opt-in to specific behavior. For instance, you choose whether to allow HTTP/HTTPS traffic, whether to use fixed or ephemeral public IPs, how host failures/maintenance should be handled, and which other services can be accessed, Google gives a lot of say to the user. It’s very clear as to what each option does. While there are some things you may have to look up to understand (e.g. “what exactly is their concept of a ‘network’?”), the user experience is very straightforward and easy enough for a newbie and powerful enough for a pro.

Another thing that stands out here is the relatively sparse set of built-in OS options. You get a decent variety of Linux flavors, but no Ubuntu. And no Windows.

Amazon Web Services

Amazon EC2 is the original IaaS, and AWS has since added tons of additional application services to their catalog.

How do you provision resources?

AWS gives you both a web console and API to provision resources. Provisioning in the UI starts by asking the user to choose a base machine image. There are a set of “quick start” ones, you can browse a massive catalog, or use a custom-built one.

Once the user chooses the base template, they select from a giant list of instance types. Like the above providers, this instance type list contains a mix of different sizes and performance levels.

At this stage, you CAN “review and launch” and skip the more advanced configuration. But, we’ll keep going. This next step gives you options for how many instances to spin up, where to put this (optionally) in a virtual private space,

Next you can add storage volumes to the instance, set metadata tags on the instance, and finally configure which security group to apply. Security groups act like a firewall policy.

What key options are available?

The broader question might be what is NOT available! Amazon gives users a broad set of image templates to pick from. That’s very nice for those who want to stand up pre-configured boxes with software ready to go. EC2 instance sizes represent a key decision point, as you have 30+ different choices. Each one serves a different purpose.

AWS offers some instance configurations that are very important to the user. Identity and Access Management (IAM) roles are nice because it lets the server run with a certain set of credentials. This way, the developer doesn’t have to embed credentials on the server itself when accessing other AWS services. The local storage in EC2 is ephemeral, so the “shutdown behavior” option is important. If you stop a box, you retain storage, if you terminate it, any local storage is destroyed.

Security groups (shown above) are ridiculously important as they control inbound traffic. A casual policy gives you a large attack surface.

What stands out?

It’s hard to ignore the complexity of the EC2 provisioning process. It’s very powerful, but there are a LOT of decisions to make and opportunities to go sideways. Users need to be smart and consider their choices carefully (although admittedly, many instance-level settings can be changed after the fact if a mistake is made).

The AWS community catalog has 34,000+ machine images, and the official marketplace has nearly 2000 machine images. Pretty epic.

Amazon makes it easy to spin up many instances of the same type. Very handy when building large clusters of identical machines.

Digital Ocean

Digital Ocean is a fast-growing, successful provider of virtual infrastructure.

How do you provision resources?

Droplets (the Digital Ocean equivalent of a virtual machine) are provisioned via web console and API. For the web console, it’s a very straightforward process that’s completed in a single page. There are 9 possible options (of which 3 require approval to use) for Droplet sizing.

The user then chooses where to run the Droplet, and which image to use. That’s about it!

What key options are available?

Hidden beneath this simple façade are some useful options. First, Digital Ocean makes it easy to choose which location, and see what extended options are available in each. The descriptions for each “available setting” are a bit light, so it’s up the user to figure out the implications of each.

Digital Ocean just supports Linux, but they offer a good list of distributions, and even some ready-to-go application environments.

What stands out?

Digital Ocean thrives on simplicity and clear pricing. Developers can fly through this process when creating servers, and the cost of each Droplet is obvious.

CenturyLink Cloud

CenturyLink – a global telecommunications company with 50+ data centers and $20 billion in annual revenue – has used acquisitions to build out its cloud portfolio. Starting with Savvis in 2011, and then continuing with AppFog and Tier 3 in 2013.

How do you provision resources?

Like everyone else, CenturyLink Cloud provides both a web and API channel for creating virtual servers. The process starts in the web console by selecting a data center to deploy to, and which collection of servers (called a “group”) to add this to.

Next, the user chooses whether to make the server “managed” or not. A managed server is secured, administered, and monitored by CenturyLink engineers, while still giving the user full access to the virtual server. There are just two server “types” in the CenturyLink Cloud: standard servers with SAN-backed storage, or Hyperscale servers with local SSD storage. If the user chooses a Hyperscale server, they can then select an anti-affinity policy. The user then selects an operating system (or customized template), and will see the projected price show up on the left hand side.

The user then chooses the size of the server and which network to put it on.

What key options are available?

Unlike the other clouds highlighted here, the CenturyLink Cloud doesn’t have the concept of “instance sizes.” Instead, users choose the exact amount of CPU, memory, and storage to add to a server. For CPU, users can also choose vertical Autoscale policies that scale a server up and down based on CPU consumption.

Like a few other clouds, CenturyLink offers a tagging ability. These “custom fields” can store data that describes the server.

It’s easy to forget to delete a temporary server, so the platform offers the ability to set a time-to-live. The server gets deleted on the date selected.

What stands out?

In this assessment, only Digital Ocean and CenturyLink actually have price transparency. It’s nice to actually know what you’re spending.

CenturyLink’s flexible sizing is convenient for those who don’t want to fit their app or workload into a fixed instance size. Similar to Digital Ocean, CenturyLink doesn’t offer 19 different types of servers to choose from. Every server has the same performance profile.

Summary

Each cloud offers their own unique way of creating virtual assets. There’s great power in offering rich, sophisticated provisioning controls, but there’s also benefit to delivering a slimmed down, focused provisioning experience. There are many commonalities between these services, but each one has a unique value proposition. In my subsequent posts in this series, I’ll look at the post-provisioning management experience, APIs, and more.

October 17, 2014
8 Characteristics of our DevOps Organization
What is the human impact of DevOps? I recently got this question from a viewer of my recent DevOps: The Big Picture course on Pluralsight.

@rseroter just watched your DevOps Pluralsight. Great tool discussion was hoping you could talk more on org structure….

— Tim Barcz (@TimBarcz) August 16, 2014

I prepared this course based on a lot of research and my own personal experience. I’ve been part of a DevOps culture for about two years with CenturyLink Cloud. Now, you might say “it’s nice that DevOps works in your crazy startup world, but I work for a big company where this radical thinking gets ignored.” While Tier 3 – my employer that was acquired by CenturyLink last Fall – was a small, rebel band of cloud lunatics, I now work at a ~$20 billion company with 40,000+ people. If DevOps can work here, it can work anywhere.

Our cloud division does DevOps and we’re working with other teams to reproduce our model. How do we do it?
1. Simple reporting structure. Pretty much everyone is one step away from our executive leadership. We avoid complicated fiefdoms that introduce friction and foster siloed thinking. How are we arranged? Something like this:
  
  Business functions like marketing and finance are part of this structure as well. Obviously as teams continue to grow, they get carved up into disciplines, but the hierarchy remains as simplistic as possible.
2. Few managers, all leaders. This builds on the above point. We don’t really have any pure “managers” in the cloud organization. Sure, there are people with direct reports. But that person’s job goes well beyond people management. Rather, everyone on EVERY team is empowered to act in the best interest of our product/service. Teams have leaders who keep the team focused while being a well-informed representative to the broader organization. “Managers” are encouraged to build organizations to control, while “leaders” are encouraged to solve problems and pursue efficiency.
3. Development and Operations orgs are partners. This is probably the most important characteristic I see in our division. The leaders of Engineering (that contains development) and Service Engineering (that contains operations) are close collaborators who set an example for teamwork. There’s no “us versus them” tolerated, and issues that come up between the teams – and of course they do – are resolved quickly and decisively. Each VP knows the top priorities and pain points of the other. There’s legitimate empathy between the leaders and organizations.
4. Teams are co-located. Our Cloud Development Center in Bellevue is the cloud headquarters. A majority of our Engineering resources not only work there, but physically sit together in big rooms with long tables. One of our developers can easily hit a support engineer with a Nerf bullet. Co-location makes our daily standups easier, problem resolution simpler, and builds camaraderie among the various teams that build and support our global cloud. Now, there are folks distributed around the globe that are part of this Engineering team. I’m remote (most of the time) and many of our 24×7 support engineers reside in different time zones. How do we make sure distributed team members still feel involved? Tools like Slack make a HUGE difference, and regular standups and meetups make a big difference.
5. Everyone looks for automation opportunities. No one in this division likes doing things manually. We wear custom t-shirts that say “Run by Robots” for crying out loud! It’s in our DNA to automate everything. You cannot scale if you do not automate. Our support engineers use our API to create tools for themselves, developers have done an excellent job maturing our continuous integration and continuous delivery capability, and even product management builds things to streamline data analysis.
6. All teams responsible for the service. Our Operations staff is not responsible for keeping our service online. Wait, what? Our whole cloud organization is responsible for keeping our service healthy and meeting business need. There’s very little “that’s not MY problem” in this division. Sure, our expert support folks are the ones doing 24×7 monitoring and optimization, but developers wear pagers and get the same notifications if there’s a blip or outage. Anyone experiencing an issue with the platform – whether it’s me doing a demo, or a finance person pulling reports – is expected to notify our NOC. We’re all measured on the success of our service. Our VP of Engineering doesn’t get a bonus for shipping code that doesn’t work in production, and our VP of Service Engineering doesn’t get kudos if he maintains 100% uptime by disallowing new features. Everyone buys into the mission of building a differentiating, feature-rich product with exceptional uptime and support. And everyone is measured by that criteria.
7. Knowledge resides in team and lightweight documentation. I came from a company where I wrote beautiful design documentation that is probably never going to be looked at again. By having long-lived teams built around a product/service, the “knowledge base” is the team! People know how things work and how to handle problems because they’ve been working together with the same service for a long time. At the same time, we also maintain a documented public (and internal) Knowledge Base where processes, best practices, and exceptions are noted. Each internal KB article is simple and to the point. No fluff. What do I need to know? Anyone on the team can contribute to the Knowledge Base, and it’s teeming with super useful stuff that is actively used and kept up to date. How refreshing!
8. We’re not perfect, or finished! There’s so much more we can do. Continuous improvement is never done. There are things we still have to get automated, further barriers to break down between team handoffs, and more. As our team grows, other problems will inevitably surface. What matters is our culture and how we approach these problems. Is it an excuse to build up a silo or blame others? Or is it an opportunity to revisit existing procedures and make them better?
DevOps can mean a lot of things to a lot of people, but if you don’t have the organizational culture set up, it’s only a superficial implementation. It’s jarring to apply this to an existing organization, and I’m starting to witness that right now as we infect the rest of CenturyLink with our DevOps mindset. As that movement advances, I’ll let you know what we’ve learned along the way.

How about you? How is your organization set up to “do DevOps”?
August 28, 2014