One of the defining characteristics of the public cloud is choice. Need to host an app? On-premises, your choices were a virtual machine or a virtual machine. Now? A public cloud like Microsoft Azure offers nearly a dozen options. You’ve got spectrum of choices ranging from complete control over the stack (e.g. Azure Virtual Machines) to opinionated runtimes (e.g. Azure Functions). While some of these options, like Azure App Service, cater to custom software, no platforms cater to a specific language or framework. Until today.
At Pivotal’s SpringOne Platform conference, Microsoft took the wraps off Azure Spring Cloud. This first-party managed service sits atop Azure Kubernetes Service and helps Java developers run cloud-native apps. How? It runs managed instances of Spring Cloud components like a Config Server and Service Registry. It builds secure container images using Cloud Native Buildpacks and kpack (both, Pivotal-sponsored OSS). It offers secure binding to other Azure services like Cosmos DB. It has integrated load balancing, log streaming, monitoring, and distributed tracing. And it delivers rolling upgrades and blue-green deployments so that it’s easy to continuously deliver changes. At the moment, Azure Spring Cloud is available as a private preview, so I thought I’d give you a quick tour.
First off, since it’s a first-party service, I can provision instances via the az command line tool, or the Azure Portal. From the Azure Portal, it’s quite simple. You just provide an Azure subscription, target region, and resource name.
Once my instance was created, I accessed the unique dashboard for Azure Spring Cloud. I saw some of the standard things that are part of every service (e.g. activity log, access control), as well as the service-specific things like Apps, Config Server, Deployments, and more.
A Spring Cloud Config Server pulls and aggregates name-value pairs from various sources and serves them up to your application via a single interface. In a typical architecture, you have to host that somewhere. For Azure Spring Cloud, it’s a managed service. So, all I need to do is point this instance to a repository holding my configuration files, and I’m set.
There’s no “special” way to build Spring Boot apps that run here. They’re just … apps. So I went to start.spring.io to generate my project scaffolding. Here, I chose dependencies for web, eureka (service registry), config client, actuator (health monitoring), and zipkin + sleuth (distributed tracing). Click here to generate an identical project.
My sample code is basic here. I just expose a REST endpoint, and pull a property value from the attached configuration store.
@RestController
@SpringBootApplication
public class AzureBasicAppApplication {
public static void main(String[] args) {
SpringApplication.run(AzureBasicAppApplication.class, args);
}
@Value("${company:Not configured by a Spring Cloud Server}")
private String company;
@GetMapping("/hello")
public String Hello() {
return "hello, from " + company;
}
}
To deploy, first I create an application from the CLI or Portal. “Creating” the application doesn’t deploy it, as that’s a separate step.
With that created, I packaged my Spring Boot app into a JAR file, and deployed it via the az CLI.
az spring-cloud app deploy -n azure-app --jar-path azure-basic-app-0.0.1-SNAPSHOT.jar
What happened next? Azure Spring Cloud created a container image, stored it in an Azure Container Registry, and deployed it to AKS. And you don’t need to care about any of that, as you can’t access the registry or AKS! It’s plumbing, that forms a platform. After a few moments, I saw my running instance, and the service registry shows that my instance is UP.
We’re dealing with containers here, so scaling is fast and easy. The “scale” section lets me scale up with more RAM and CPU, or out with more instances.
Cloud native, 12-factor apps should treat backing services like attached resources. Azure Spring Cloud embodies this by letting me set up service bindings. Here, I set up a linkage to another Azure service, and at runtime, its credentials and connection string are injected into the app’s configuration. All of this is handled auto-magically by the Spring Boot starters from Azure.
Logging data goes into Azure Monitor. You can set up Log Analytics for analysis, or pump out to a third party Application Performance Monitoring tool.
So you have logging, you have built-in monitoring, and you ALSO get distributed tracing. For microservices, this helps you inspect the call graph and see where your performance bottlenecks are. The pic below is from an example app built by Julien at Microsoft.
Finally, I can do blue/green deploy. This means that I deploy a second instance of the app (via the az CLI, it’s another “deployment”), can independently test it, and then choose to swap traffic over to that instance when I’m ready. If something goes wrong, I can switch back.
So far, it’s pretty impressive. This is one of the first industry examples of turning Kubernetes into an actual application platform. There’s more functionality planned as the service moves to public preview, beta, and general availability. I’m happy to see Microsoft make such a big bet on Spring, and even happier that developers have a premium option for running Java apps in the public cloud.
First, I went to the Confluent Cloud site and clicked the “try free” button. I was prompted to create an account.
I was asked for credit card info to account for overages above the free tier (and after the free credits expire in 3 months), which I provided. With that, I was in.
First, I was prompted to create a cluster. I do what I’m told.
Here, I was asked to provide a cluster name, and choose a public cloud provider. For each cloud, I was shown a set of available regions. Helpfully, the right side also showed me the prices, limits, billing cycle, and SLA. Transparency, FTW!
While that was spinning up, I followed the instructions to install the Confluent Cloud CLI so that I could also geek-out at the command line. I like the the example CLI commands in the docs actually reflect the values of my environment (e.g. cluster name). Nice touch.
Within maybe a minute, my Kafka cluster was running. That’s pretty awesome. I chose to create a new topic with 6 partitions. I’m able to choose up to 60 partitions for a topic, and define other settings like data retention period, max size on disk, and cleanup policy.
Before building an app to publish data to Confluent Cloud, I needed an API key and secret. I could create this via the CLI, or the dashboard. I generated the key via the dashboard, saved it (since I can’t see it again after generating), and saw the example Java client configuration updated with those values. Handy, especially because I’m going to talk to Kafka via Spring Cloud Stream!
Now I needed an app that would send messages to Apache Kafka in the Confluent Cloud. I chose Spring Boot because I make good decisions. Thanks to the Spring Cloud Stream project, it’s super-easy to interact with Apache Kafka without having to be an expert in the tech itself. I went to start.spring.io to generate a project. If you click this link, you can download an identical project configuration.
I opened up this project and added the minimum code and configuration necessary to gab with Apache Kafka in Confluent Cloud. I wanted to be able to submit an HTTP request and see that message published out. That required one annotation to create a REST controller, and one annotation to indicate that this app is a source to the stream. I then have a “Source” variable is autowired, which means it’s inflated by Spring Boot at runtime. Finally, I have a single operation that responds to an HTTP post command and writes the payload to the message stream. That’s it!
The final piece? The application configuration. In the application.properties file, I set the handful of parameters, mostly around the target cluster, topic name, and credentials.
I started up the app, confirmed that it connected (via application logs), and opened Postman to issue an HTTP POST command.
After switching back to the Confluent Cloud dashboard, I saw my messages pop up.
You can probably repeat this whole demo in about 10 minutes. As you can imagine, there’s a LOT more you can do with Apache Kafka than what I showed you. If you want an environment to learn Apache Kafka in depth, it’s now a no-brainer to spin up a free account in Confluent Cloud. And if you want to use a legit managed Apache Kafka for production in any cloud, this seems like a good bet as well.
Design software that solves someone’s “job to be done“, build it, package it, ship it, collect feedback, learn, and repeat. That’s the dream, right? For many, shipping software is not fun. It’s downright awful. Too many tickets, too many handoffs, and too many hours waiting. Continuous integration and delivery offer some relief, as you keep producing tested, production-ready artifacts. Survey data shows that we’re not all adopting this paradigm as fast as we should. I figured I’d do my part by preparing and delivering a new video training course about Concourse.
I’ve been playing a lot with Concourse recently, and published a 3-part blog series on using it to push .NET Core apps to Kubernetes. It’s an easy-to-use CI system with declarative pipelines and stateless servers. Concourse runs jobs on Windows or Linux, and works with any programming language you use.
My new hands-on Pluralsight course is ~90 minutes long, and gives you everything you need to get comfortable with the platform. It’s made up of three modules. The first module looks at key concepts, the Concourse architecture, and user roles, and we set up our local environment for development.
The second module digs deep into the primitives of Concourse: tasks, jobs and resources. I explain how to configure each, and then we go hands on with each. There are aspects that took me a while to understand, so I worked hard to explain these well!
The third and final module looks at pipeline lifecycle management and building manageable pipelines. We explore troubleshooting and more.
Believe it or not, this is my 20th course with Pluralsight. Over these past 8 years, I’ve switched job roles many times, but I’ve always enjoyed learning new things and sharing that information with others. Pluralsight makes that possible for me. I hope you enjoy this new course, and most importantly, start doing CI/CD for more of your workloads!
So far in this blog series, we’ve set up our local machine and cloud environment, and built the initial portion of a continuous delivery pipeline. That pipeline, built using the popular OSS tool Concourse, pulls source code from GitHub, generates a Docker image that’s stored in Azure Container Registry, and produces a tarball that’s stashed in Azure Blob Storage. What’s left? Deploying our container image to Azure Kubernetes Service (AKS). Let’s go.
Generating AKS credentials
Back in blog post one, we set up a basic AKS cluster. For Concourse to talk to AKS, we need credentials!
From within the Azure Portal, I started up an instance of the Cloud Shell. This is a hosted Bash environment with lots of pre-loaded tools. From here, I used the AKS CLI to get the administrator credentials for my cluster.
az aks get-credentials --name seroter-k8s-cluster --resource-group demos --admin
This command generated a configuration file with URLs, users, certificates, and tokens.
I copied this file locally for use later in my pipeline.
Creating a role-binding for permission to deploy
The administrative user doesn’t automatically have rights to do much in the default cluster namespace. Without explicitly allowing permissions, you’ll get some gnarly “does not have access” errors when doing most anything. Enter role-based access controls. I created a new rolebinding named “admin” with admin rights in the cluster, and mapped to the existing clusterAdmin user.
Now I knew that Concourse could effectively interact with my Kubernetes cluster.
Giving AKS access to Azure Container Registry
Right now, Azure Container Registry (ACR) doesn’t support an anonymous access strategy. Everything happens via authenticated users. The Kubernetes cluster needs access to its container registry, so I followed these instructions to connect ACR to AKS. Pretty easy!
Creating Kubernetes deployment and service definitions
Concourse is going to apply a Kubernetes deployment to create pods of containers in the cluster. Then, Concourse will apply a Kubernetes service to expose my pod with a routable endpoint.
I created a pair of configurations and added them to the ci folder of my source code.
I now had all the artifacts necessary to finish up the Concourse pipeline.
Adding Kubernetes resource definitions to the Concourse pipeline
First, I added a new resource type to the Concourse pipeline. Because Kubernetes isn’t a baked-in resource type, we need to pull in a community definition. No problem. This one’s pretty popular. It’s important than the Kubernetes client and server are expecting the same Kubernetes version, so I set the tag to match my AKS version.
There are a few key things to note here. First, the “server” refers to the cluster DNS server name in the credentials file. The “token” refers to the token associated with the clusterAdmin user. For me, it’s the last “user” called out in the credentials file. Finally, let’s talk about the certificate authority. This value comes from the “certificate-authority-data” entry associated with the cluster DNS server. HOWEVER, this value is base64 encoded, and I needed a decoded value. So, I decoded it, and embedded it as you see above.
Let’s unpack this. First, I “get” the Azure Container Registry resource. When it changes (because it gets a new version of the container), it triggers this job. It only fires if the “containerize app” job passes first. Then I get the source code (so that I can grab the deployment.yaml and service.yaml files I put in the ci folder), and I get the semantic version.
Next I “put” to the AKS resource, twice. In essence, this resource executes kubectl commands. The first command does a kubectl apply for both the deployment and service. On the first run, it provisions the pod and exposes it via a service. However, because the container image tag in the deployment file is to “latest”, Kubernetes actually won’t retrieve new images with that tag after I apply a deployment. So, I “patched” the deployment in a second “put” step and set the deployment’s image tag to the semantic version. This triggers a pod refresh!
Deploy and run the Concourse pipeline
I deployed the pipeline as a new revision with this command:
I unpaused the pipeline and watched it start up. It quickly reached and completed the “deploy to AKS” stage.
But did it actually work? I jumped back into the Azure Cloud Shell to check it out. First, I ran a kubectl get pods command. Then, a kubectl get services command. The first showed our running pod, and the second showed the external IP assigned to my pod.
I also issued a request to that URL in the browser, and got back my ASP.NET Core API results.
Also to prove that my “patch” command worked, I ran the kubectl get deployment demo-app –output=yaml command to see which container image my deployment referenced. As you can see below, it no longer references “latest” but rather, a semantic version number.
With all of these settings, I now have a pipeline that “just works” whenever I updated my ASP.NET Core source code. It tests the code, packages it up, and deploys it to AKS in seconds. I’ve added all the pipelines we created here to GitHub so that you can easily try this all out.
Whatever CI/CD tool you use, invest in automating your path to production.
Let’s continuously deliver an ASP.NET Core app to Kubernetes using Concourse. In part one of this blog series, I showed you how to set up your environment to follow along with me. It’s easy; just set up Azure Container Registry, Azure Storage, Azure Kubernetes Service, and Concourse. In this post, we’ll start our pipeline by pulling source code, running unit tests, generating a container image that’s stored in Azure Container Registry, and generating a tarball for Azure Blob Storage.
We’re building this pipeline with Concourse. Concourse has three core primitives: tasks, jobs, and resources. Tasks form jobs, jobs form pipelines, and state is stored in resources. Concourse is essentially stateless, meaning there are no artifacts on the server after a build. You also don’t register any plugins or extensions. Rather, the pipeline is executed in containers that go away after the pipeline finishes. Any state — be it source code or Docker images — resides in durable resources, not Concourse itself.
Let’s start building a pipeline.
Pulling source code
A Concourse pipeline is defined in YAML. Concourse ships with a handful of “known” resource types including Amazon S3, git, and Cloud Foundry. There are dozens and dozens of community ones, and it’s not hard to build your own. Because my source code is stored in GitHub, I can use the out-of-the-box resource type for git.
At the top of my pipeline, I declared that resource.
I’ve gave the resource a name (“source-code”) and identified where the code lives. That’s it! Note that when you deploy a pipeline, Concourse produces containers that “check” resources on a schedule for any changes that should trigger a pipeline.
Running unit tests
Next up? Build a working version of a pipeline that does something. Specifically, it should execute unit tests. That means we need to define a job.
A job has a build plan. That build plan contains any of three things: get steps (to retrieve a resource), put steps (to push something to a resource), and task steps (to run a script). Our job below has one get step (to retrieve source code), and one task (to execute the xUnit tests).
Let’s break it down. First, my “plan” gets the source-code resource. And because I set “trigger: true” Concourse will kick off this job whenever it detects a change in the source code.
Next, my build plan has a “task” step. Tasks run in containers, so you need to choose a base image that runs the user-defined script. I chose the Microsoft-provided .NET Core image so that I’d be confident it had all the necessary .NET tooling installed. Note that my task has an “input.” Since tasks are like functions, they have inputs and outputs. Anything I input into the task is mounted into the container and is available to any scripts. So, by making the source-code an input, my shell script can party on the source code retrieved by Concourse.
Finally, I embedded a short script that invokes the “dotnet test” command. If I were being responsible, I’d refactor this embedded script into an external file and reference that file. But hey, this is easier to read.
This is now a valid pipeline. In the previous post, I had you install the fly CLI to interact with Concourse. From the fly CLI, I deploy pipelines with the following command:
That command says to use the “rs” target (which points to a given Concourse instance), use the YAML file holding the pipeline, and name this pipeline azure-k8s-rev1. It deployed instantly, and looked like this in the Concourse web dashboard.
After unpausing the pipeline so that it came alive, I saw the “run unit tests” job start running. It’s easy to view what a job is doing, and I saw that it loaded the container image from Microsoft, mounted the source code, ran my script and turned “green” because all my tests passed.
Nice! I had a working pipeline. Now to generate a container image.
Producing and publishing a container image
A pipeline that just run tests is kinda weird. I need to do something when tests pass. In my case, I wanted to generate a Docker image. Another of the built-in Concourse resource types is “docker-image” which generates a container image and puts it into a registry. Here’s the resource definition that worked with Azure Container Registry:
Where do you get those Azure Container Registry values? From the Azure Portal, they’re visible under “Access keys.” I grabbed the Username and one of the passwords.
What’s this job doing? Notice that I “get” the source code again. I also set a “passed” attribute meaning this will only run if the unit test step completes successfully. This is how you start chaining jobs together into a pipeline! Then I “put” into the registry. Recall from the first blog post that I generated a Dockerfile from within Visual Studio for Mac, and here, I point to it. The resource does a “docker build” with that Dockerfile, tags the resulting image as the “latest” one, and pushes to the registry.
I manually triggered the “run unit tests” job, and after it completed, the “containerize app” job ran. When that was finished, I checked Azure Container Registry and saw a new repository one with image in it.
Generating and storing a tarball
Not every platform wants to run containers. BLASPHEMY! BURN THE HERETIC! Calm down. Some platforms happily take your source code and run it. So our pipeline should also generate a single artifact with all the published ASP.NET Core files.
I wanted to store this blob in Azure Storage. Since Azure Storage isn’t a built-in Concourse resource type, I needed to reference a community one. No problem finding one. For non-core resources, you have to declare the resource type in the pipeline YAML.
Here the “type” matches the resource type name I set earlier. Then I set the credentials (retrieved from the “Access keys” section in the Azure Portal), container name (pre-created in the first blog post), and the name of the file to upload. Regex is supported here too.
Finally, I added a new job that takes source code, runs a “publish” command, and creates a tarball from the result.
Note that this job is also triggered when unit tests succeed. But it’s not connected to the containerization job, so it runs in parallel. Also note that in addition to an input, I also have outputs defined on the task. This generates folders that are visible to subsequent steps in the job. I dropped the tarball into the “artifact-repo” folder, and then “put” that file into Azure Blob Storage.
Now this pipeline’s looking pretty hot. Notice that I have parallel jobs that fire after I run unit tests.
I once again triggered the unit test job, and watched the subsequent jobs fire. After the pipeline finished, I had another updated container image in Azure Container Registry and a file sitting in Azure Storage.
Adding semantic version to the container image
I could stop there and push to Kubernetes (next post!), but I wanted to do one more thing. I don’t like publishing Docker images with the “latest” tag. I want a real version number. It makes sense for many reasons, not the least of which is that Kubernetes won’t pick up changes to a container if the tag doesn’t change! Fortunately, Concourse has a default resource type for semantic versioning.
There are a few backing stores for the version number. Since Concourse is stateless, we need to keep the version value outside of Concourse itself. I chose a git backend. Specifically, I added a branch named “version” to my GitHub repo, and added a single file (no extension) named “version”. I started the version at 0.1.0.
Then, I ensured that my GitHub account had an SSH key associated with it. I needed this so that Concourse could write changes to this version file sitting in GitHub.
I added a new resource to my pipeline definition, referencing the built-in semver resource type.
- name: version
type: semver
source:
driver: git
uri: git@github.com:rseroter/seroter-api-k8s.git
branch: version
file: version
private_key: |
-----BEGIN OPENSSH PRIVATE KEY-----
[...]
-----END OPENSSH PRIVATE KEY-----
In that resource definition, I pointed at the repo URI, branch, file name, and embedded the private key for my account.
Next, I updated the existing “containerization” job to get the version resource, use it, and then update it.
First, I added another ‘get” for version. Notice that its parameter increments the number by one minor version. Then, see that the “put” for the container registry uses “version/version” as the tag file. This ensures our Docker image is tagged with the semantic version number. Finally, notice I “put” the incremented version file back into GitHub after using it successfully.
I deployed a fourth revision of this pipeline using this command:
You see the pipeline, post-execution, below. The “version” resource comes into and out of the “containerize app” job.
With the pipeline done, I saw that the “version” value in GitHub was incremented by the pipeline, and most importantly, our Docker image has a version tag.
In this blog post, we saw how to gradually build up a pipeline that retrieves source and prepares it for downstream deployment. Concourse is fun and easy to use, and its extensibility made it straightforward to deal with managed Azure services. In the final blog post of this series, we’ll take pipeline-generated Docker image and deploy it to Azure Kubernetes Service.
Isn’t it frustrating to build great software and helplessly watch as it waits to get deployed? We don’t just want to build software in small batches, we want to ship it in small batches. This helps us learn faster, and gives our users a non-stop stream of new value.
I’m a big fan of Concourse. It’s a continuous integration platform that reflects modern cloud-native values: it’s open source, container-native, stateless, and developer-friendly. And all pipeline definitions are declarative (via YAML) and easily source controlled. I wanted to learn how build a Concourse pipeline that unit tests an ASP.NET Core app, packages it up and stashes a tarball in Azure Storage, creates a Docker container and stores it in Azure Container Registry, and then deploy the app to Azure Kubernetes Service. In this three part blog series, we’ll do just that! Here’s the final pipeline:
This first posts looks at everything I did to set up the scenario.
My ASP.NET Core web app
I used Visual Studio for Mac to build a new ASP.NET Core Web API. I added NuGet package dependencies to xunit and xunit.runner.visualstudio. The API controller is super basic, with three operations.
[Route("api/[controller]")]
[ApiController]
public class ValuesController : ControllerBase
{
[HttpGet]
public ActionResult<IEnumerable<string>> Get()
{
return new string[] { "value1", "value2" };
}
[HttpGet("{id}")]
public string Get(int id)
{
return "value1";
}
[HttpGet("{id}/status")]
public string GetOrderStatus(int id)
{
if (id > 0 && id <= 20)
{
return "shipped";
}
else
{
return "processing";
}
}
}
I also added a Testing class for unit tests.
public class TestClass
{
private ValuesController _vc;
public TestClass()
{
_vc = new ValuesController();
}
[Fact]
public void Test1()
{
Assert.Equal("value1", _vc.Get(1));
}
[Theory]
[InlineData(1)]
[InlineData(3)]
[InlineData(9)]
public void Test2(int value)
{
Assert.Equal("shipped", _vc.GetOrderStatus(value));
}
}
Next, I right-clicked my project and added “Docker Support.”
What this does is add a Docker Compose project to the solution, and Dockerfile to the project. Due to relative paths and such, if you try and “docker build” from directly within the project directory containing the Docker file, Docker gets angry. It’s meant to be invoked from the parent directory with a path to the project’s directory, like:
docker build -f seroter-api-k8s/Dockerfile .
I wasn’t sure if my pipeline could handle that nuance when containerizing my app, so just went ahead and moved the generated Dockerfile to the parent directory like in the screenshot below. From here, I could just execute the docker build command.
Where should we store our pipeline-created container images? You’ve got lots of options. You could use the Docker Hub, self-managed OSS projects like VMware’s Harbor, or cloud-specific services like Azure Container Registry. Since I’m trying to use all-things Azure, I chose the latter.
It’s easy to set up an ACR. Once I provided the couple parameters via the Azure Dashboard, I had a running, managed container registry.
Provisioning an Azure Storage blob
Container images are great. We may also want the raw published .NET project package for archival purposes, or to deploy to non-container runtimes. I chose Azure Storage for this purpose.
I created a blob storage account named seroterbuilds, and then a single blob container named coreapp. This isn’t a Docker container, but just a logical construct to hold blobs.
Creating an Azure Kubernetes Cluster
It’s not hard to find a way to run Kubernetes. I think my hair stylist sells a distribution. You can certainly spin up your own vanilla server environment from the OSS bits. Or run it on your desktop with minikube. Or run an enterprise-grade version anywhere with something like VMware PKS. Or run it via managed service with something like Azure Kubernetes Service (AKS).
AKS is easy to set up, and I provided the version (1.13.9), node pool size, service principal for authentication, and basic HTTP routing for hosted containers. My 3-node cluster was up and running in a few minutes.
Starting up a Concourse environment
Finally, Concourse. If you visit the Concourse website, there’s a link to a Docker Compose file you can download and start up via docker-compose up. This starts up the database, worker, and web node components needed to host pipelines.
Once Concourse is up and running, the web-based Dashboard is available on localhost:8080.
From there you can find links (bottom left) to downloads for the command line tool (called fly). This is the primary UX for deploying and troubleshooting pipelines.
With fly installed, we create a “target” that points to our environment. Do this with the following statement. Note that I’m using “rs” (my initials) as the alias, which gets used for each fly command.
fly -t rs login -c http://localhost:8080
Once I request a Concourse login (default username is “test” and password is “test”), I’m routed to the dashboard to get a token, which gets loaded automatically into the CLI.
At this point, we’ve got a functional ASP.NET Core app, a container registry, an object storage destination, a managed Kubernetes environment, and a Concourse. In the next post, we’ll build the first part of our Azure-focused pipeline that reads source code, runs tests, and packages the artifacts.
Serverless things don’t always complete their work in milliseconds. With the introduction of AWS Step Functions and Azure Durable Functions, we have compute instances that exist for hours, days, or even months. With serverless workflow tools like Azure Logic Apps, it’s also easy to build long-running processes. So in this world of continuous delivery and almost-too-easy update processes, what happens when you update the underlying definition of things that have running instances? Do they use the version they started with? Do they pick up changes and run with those after waking up? Do they crash and cause the heat death of the universe? I was curious, so I tried it out.
Azure Durable Functions
Azure Durable Functions extends “regular” Azure Functions. They introduce a stateful processing layer by defining an “orchestrator” that calls Azure Functions, checkpoints progress, and manages intermediate state.
Let’s build one, and then update it to see what happens to the running instances.
First, I created a new Function App in the Azure Portal. A Function App holds individual functions. This one uses the “consumption plan” so I only pay for the time a function runs, and contains .NET-based functions. Also note that it provisions a storage account, which we’ll end up using for checkpointing.
Durable Functions are made up of a client function that create an orchestration, orchestration functions that coordinate work, and activity functions that actually do the work. From the Azure Portal, I could see a template for creating an HTTP client (or starter) function.
The function code generated by the template works as-is.
#r "Microsoft.Azure.WebJobs.Extensions.DurableTask"
#r "Newtonsoft.Json"
using System.Net;
public static async Task<HttpResponseMessage> Run(
HttpRequestMessage req,
DurableOrchestrationClient starter,
string functionName,
ILogger log)
{
// Function input comes from the request content.
dynamic eventData = await req.Content.ReadAsAsync<object>();
// Pass the function name as part of the route
string instanceId = await starter.StartNewAsync(functionName, eventData);
log.LogInformation($"Started orchestration with ID = '{instanceId}'.");
return starter.CreateCheckStatusResponse(req, instanceId);
}
Next I created the activity function. Like with the client function, the Azure Portal generates a working function from the template. It simply takes in a string, and returns a polite greeting.
The final step was to create the orchestrator function. The template-generated code is below. Notice that our orchestrator calls the “hello” function three times with three different inputs, and aggregates the return values into a single output.
After saving this function, I went back to the starter/client function and clicked the “Get function URL” link to get the URL I need to invoke to instantiate this orchestrator. Then, I plugged that into Postman, and submitted a POST request.
Since the Durable Function is working asynchronously, I get back URIs to check the status, or terminate the orchestrator. I invoked the “get status” endpoint, and saw the aggregated results returned from the orchestrator function.
So it all worked. Terrific. Next I wanted to add a delay in between activity function calls to simulate a long-running process. What’s interesting with Durable Functions is that every time it gets results back from an async call (or timer), it reruns the entire orchestrator from scratch. Now, it checks the execution log to avoid calling the same operation again, but this made me wonder how it would respond if I added *new* activities in the mix, or deleted activities.
First, I added some instrumentation to the orchestrator function (and injected function input) so that I could see more about what was happening. In the code below, if we’re not replaying activities (so, first time it’s being called), it traces out a message.
public static async Task<List<string>> Run(DurableOrchestrationContext context, ILogger log)
{
var outputs = new List<string>();
outputs.Add(await context.CallActivityAsync<string>("Hello", "Tokyo"));
if (!context.IsReplaying) log.LogInformation("Called function once.");
outputs.Add(await context.CallActivityAsync<string>("Hello", "Seattle"));
if (!context.IsReplaying) log.LogInformation("Called function twice.");
outputs.Add(await context.CallActivityAsync<string>("Hello", "London"));
if (!context.IsReplaying) log.LogInformation("Called function thrice.");
// returns ["Hello Tokyo!", "Hello Seattle!", "Hello London!"]
return outputs;
}
After saving this update, I triggered the client function again, and with the streaming “Logs” view open in the Portal. Here, I saw trace statements for each call to an activity function.
A durable function supports Timers that pause processing for up to seven days. I added the following code between the second and third function calls. This pauses the function for 30 seconds.
if (!context.IsReplaying) log.LogInformation("Starting delay.");
DateTime deadline = context.CurrentUtcDateTime.Add(TimeSpan.FromSeconds(30));
await context.CreateTimer(deadline, System.Threading.CancellationToken.None);
if (!context.IsReplaying) log.LogInformation("Delay finished.");
If you trigger the client function again, it will take 30-ish seconds to get results back, as expected.
Next I tested three scenarios to see how Durable Functions handled them:
Wait until the orchestrator hits the timer, andchange the payload for an activity function call that executed before the timer started. What happens when the framework tries to re-run a step that’s changed? I changed the first function’s payload from “Tokyo” to “Mumbai” after the function instance had already passed the first call, and was paused at the timer. After the function resumed from the timer, the orchestrator failed with a message of: “Non-Deterministic workflow detected: TaskScheduledEvent: 0 TaskScheduled Hello.” Didn’t like that. Changing the call signature, or apparently even the payload is a no-no if you don’t want to break running instances.
Wait until the orchestrator hits the timer, and update the function to introduce a new activity function call in code above the timer. Does the framework execute that new function call when it wakes up and re-runs, or ignore it? Indeed, it runs it. So after the timer wrapped up, the NEW, earlier function call got invoked, AND it ran the timer again before continuing. That part surprised me, and it only kinda worked. Instead of returning the expected value from the activity function, I got a “2” back. And some times when I tested this, I got the above “non-deterministic workflow” error. So, your mileage may vary.
Add an activity call after the timer, and see if it executes it after the delay is over. Does the orchestrator “see” the new activity call I added to the code after it woke back up? The first time I tried this, I again got the “non-deterministic workflow” error, but with a few more tests, I saw it actually executed the new function after waking back up, AND running the timer a second time.
What have we learned? The “version” a Durable Function starts with isn’t serialized and used for the entirety of the execution. It’s picking up things changing along the way. Be very aware of side effects! For a number of these tests, I also had to “try again” and would see different results. I feel like I was breaking Azure Functions!
What’s the right way to version these? Microsoft offers some advice, which ranges from “do nothing and let things fail” to “deploy an entirely new function.” But from these tests, I’d advise against changing function definitions outside of explicitly deploying new versions.
Azure Logic Apps
Let’s take a look at Logic Apps. This managed workflow service is designed for constructing processes that integrate a variety of sources and targets. It supports hundreds of connectors to things likes Salesforce.com, Amazon Redshift, Slack, OneDrive, and more. A Logic App can run for 90 days in the multi-tenant environment, and up to a year in the dedicated environment. So, most users of Logic Apps are going to have instances in-flight when it comes time to deploy updates.
To test this out, I first created a couple of Azure Functions that Logic Apps could call. These JavaScript functions are super lame, and just return a greeting.
Next up, I created a Logic App. It’s easy.
After a few moments, I could jump in and start designing my workflow. As a “serverless” service, Logic Apps only run when invoked, and start with a trigger. I chose the HTTP trigger.
My Logic App takes in an HTTP request, has a 45 second “delay” (which could represent waiting for new input, or a long-running API call) before invoke our simple Azure Function.
I saved the Logic App, called the HTTP endpoint via Postman, and waited. After about 45 seconds, I saw that everything succeeded.
Next, I kicked off another instance, and quickly went in and added another Function call after the first one. What would Logic Apps do with that after the delay was over? It ignored the new function call. Then I kicked off another Logic Apps instance, and quickly deleted the second function call. Would the instance wake up and now only call one Function? Nope, it called them both.
So it appears that Logic Apps snapshot the workflow when it starts, and it executes that version, regardless of what changes in the underlying definition after the fact. That seems good. It results in a more consistent, predictable process. Logic Apps does have the concept of versioning, and you can promote previous versions to the active one as needed.
AWS Step Functions
AWS doesn’t have something exactly like Logic Apps, but AWS Step Functions is somewhat similar to Azure Durable Functions. With Step Functions, you can chain together a series of AWS services into a workflow. It basically builds a state machine that you craft in their JSON-based Amazon State Language. A given Step Function can be idle for up to a year, so again. you’ll probably have long-running instances going at all times!
I jumped into the AWS console and started with their “hello world” template.
This state machine has a couple basic states that execute immediately. Then I added a 20 second wait.
After deploying the Step Function, it was easy to see that it ran everything quickly and successfully.
Next, I kicked off a new instance, and added a new step to the state machine while the instance was waiting. The Step Function that was running ignored it.
When I kicked off another Step Function and removed the step after the wait step, it also ignored it. It seems pretty clear that AWS Step Functions snapshot the workflow at the start proceed with that snapshot, even if the underlying definition changes. I didn’t find much documentation around formally versioning Step Functions, but it seems to keep you fairly safe from side effects.
With all of these, it’s important to realize that you also have to consider versioning of downstream calls. I could have an unchanged Logic App, but the function or API it invokes had its plumbing entirely updated after the Logic App started running. There’s no way to snapshot the state of all the dependencies! That’s normal in a distributed system. But, something to remember.
Have you observed any different behavior with these stateful serverless products?
Microsoft Azure isn’t a platform. Like every other public cloud, it’s an environment and set of components you’ll use to construct your platform. It’s more like an operating system that you build on, versus some integrated product you just start using. And that’s exciting. Which platform will you create? Let’s look at what a software platform is, how I landed on 295,680 unique configurations — 10,036,224,000 configurations if you add a handful of popular 3rd party products — and the implications of all this.
What’s a software platform?
Let’s not overthink this. When I refer to a software platform, I’m thinking of the technologies that come together to help you deploy, run, and manage software. This applies whether we’re talking about stuff in your data center or in the public cloud. It applies to serverless systems or good ol’ virtual machines.
I made a map of the non-negotiable capabilities. One way or another, you’re stitching these things together. You need an app runtime (e.g. VMs, FaaS, containers), databases, app deployment tools, infrastructure/config management, identity systems, networking, monitoring, and more. Pretty much all your running systems depend on this combination of things.
How many platform combinations are there in Microsoft Azure?
I’m not picking on Microsoft here; you face the same decision points in every cloud. Each customer — realistically, each tenant in your account — chooses among the various services to assemble the platform they want. Let’s first talk about only using Microsoft’s first-party services. So, assume you aren’t using any other commercial or OSS products to build and run your systems. Unlikely, but hey, work with me here.
For math purposes, let’s calculate unique combinations by assuming your platform uses one thing from each category. The exception? App runtimes and databases. I think it’s more likely you’re using at least a pair of runtimes (e.g. Azure App Service AND Azure Kubernetes Service, or, Windows VMs AND Linux VMs), and a pair of databases. You may be using more, but I’m being conservative.
295,680 unique platforms you can build on Azure using only native services. Ok, that’s a lot. Now that said, I suspect it’s VERY unlikely you’ll only use native cloud services to build and run software. You might love using Terraform to deploy infrastructure. Or MongoDB Atlas for a managed MongoDB environment. Kong may offer your API gateway of choice. Maybe Prometheus is your standard for monitoring. And don’t forget about all the cool managed services like Twilio, Auth0, or Algolia. The innovation outside of any one cloud will always be greater than the innovation within that cloud. You want in on some of that action! So what happens if I add just a few leading non-Azure services to your platform?
Yowza. 10 billion platform combinations. You can argue with my math or approach, but hopefully you get my point.
And again, this really isn’t about any particular cloud. You can end up in the same place on AWS, Digital Ocean, or Alibaba. It’s simply the result of all the amazing choices we have today.
Who cares?
If you’re working at a small company, or simply an individual using cloud services, this doesn’t really matter. You’ll choose the components that help you deliver software best, and go with it.
Where this gets interesting is in the enterprise. You’ve got many distinct business units, lots of existing technology in use, and different approaches to using cloud computing. Done poorly, your cloud environment stands to be less secure, harder to maintain, and more chaotic than anything you have today. Done well, your cloud environment will accelerate time-to-market and lower your infrastructure costs so that you can spend more time building software your customers love.
My advice? Decide on your tenancy model up-front, and purposely limit your choices.
In the public cloud, you can separate user pools (“tenants”) in at least three different ways:
Everyone in one account. Some companies start this way, and use role-based access or other in-account mechanisms (e.g. resource groups) to silo individual groups or apps. On the positive side, this model is easier to audit and developers can easily share access to services. Challenges here include a bigger blast radius for mistakes, and greater likelihood of broad processes (e.g. provisioning or policy changes) that slow individuals down.
Separate-but-equal sub accounts. I’ve seen some large companies use child accounts or subscriptions for each individual tenant. Each tenant’s account is set up in the same way, with access to the same services. Every tenant owns their own resources, but they operate a standard “platform.” On the plus side, this makes it easier to troubleshoot problems across the org, and enforce a consistent security profile. It also makes engineer rotation easier, since each team has the same stack. On the downside, this model doesn’t account for unique needs of each team and may force suboptimal tech choices in the name of consistency.
Independent sub accounts. A few weeks ago, I watched my friend Brian Chambers explain that each of 30 software teams at Chick-fil-A has their own AWS account, with a recommended configuration that tenants can use, modify, or ignore. Here, every tenant can do their own thing. One of the benefits is that each group can cater their platform to their needs. If a team wants to go entirely serverless, awesome. Another may be all in on Kubernetes. The downside of this model is that you can’t centralize things like security patching, and you can end up with snowflake environments that increase your costs over time.
Finally, it’s wise to purposely limit choices. I wrote about this topic last month. Facing too many choices can paralyze you, and all that choice adds only incremental benefits. For mature categories like databases, pick two and stop there. If it’s an emerging space, offer more freedom until commoditization happens. Give your teams some freedom to build their platform on the public cloud, but put some guardrails in place.
Serverless computing. Let’s talk about it. I don’t think it’s crazy to say that it represents the first cloud-native software model. Done right, it is inherently elastic and pay-per-use, and strongly encourages the use of cloud managed services. And to be sure, it’s about much more than just Function-as-a-Service platforms like AWS Lambda.
So, what exactly is it, why does it matter, and what technologies and architecture patterns should you know? To answer that question, I spent a few months researching the topic, and put together a new Pluralsight course, Serverless Computing: The Big Picture.
The course is only an hour long, but I get into some depth on benefits, challenges, and patterns you should know.
The first module looks at the various serverless definitions offered by industry experts, why serverless is different from what came before it, how serverless compares to serverful systems, challenges you may face adopting it, and example use cases.
The second module digs into the serverless tech that matters. I look at public cloud function-as-a-service platforms, installable platforms, dev tools, and managed services.
The final module of the course looks at architecture patterns. We start by looking at best practices, then review a handful of patterns.
As always, I had fun putting this together. It’s my 19th Pluralsight course, and I don’t see stopping any time soon. If you watch it, I’d love your feedback. I hope it helps you get a handle on this exciting, but sometimes-confusing, topic!
You’ve got microservices. Great. They’re being continuous delivered. Neato. Ok … now what? The next hurdle you may face is data processing amongst this distributed mesh o’ things. Brokered messaging engines like Azure Service Bus or RabbitMQ are nice choices if you want pub/sub routing and smarts residing inside the broker. Lately, many folks have gotten excited by stateful stream processing scenarios and using distributed logs as a shared source of events. In those cases, you use something like Apache Kafka or Azure Event Hubs and rely on smart(er) clients to figure out what to read and what to process. What should you use to build these smart stream processing clients?
I’ve written about Spring Cloud Stream a handful of times, and last year showed how to integrate with the Kafka interface on Azure Event Hubs. Just today, Microsoft shipped a brand new “binder” for Spring Cloud Stream that works directly with Azure Event Hubs. Event processing engines aren’t useful if you aren’t actually publishing or subscribing to events, so I thought I’d try out this new binder and see how to light up Azure Event Hubs.
Setting Up Microsoft Azure
First, I created a new Azure Storage account. When reading from an Event Hubs partition, the client maintains a cursor. This cursor tells the client where it should start reading data from. You have the option to store this cursor server-side in an Azure Storage account so that when your app restarts, you can pick up where you left off.
There’s no need for me to create anything in the Storage account, as the Spring Cloud Stream binder can handle that for me.
Next, the actual Azure Event Hubs account! First I created the namespace. Here, I chose things like a name, region, pricing tier, and throughput units.
Like with the Storage account, I could stop here. My application will automatically create the actual Event Hub if it doesn’t exist. In reality, I’d probably want to create it first so that I could pre-define things like partition count and message retention period.
Creating the event publisher
The event publisher takes in a message via web request, and publishes that message for others to process. The app is a Spring Boot app, and I used the start.spring.io experience baked into Spring Tools (for Eclipse, Atom, and VS Code) to instantiate my project. Note that I chose “web” and “cloud stream” dependencies.
With the project created, I added the Event Hubs binder to my project. In the pom.xml file, I added a reference to the Maven package.
Now before going much farther, I needed a credentials file. Basically, it includes all the info needed for the binder to successfully chat with Azure Event Hubs. You use the az CLI tool to generate it. If you don’t have it handy, the easiest option is to use the Cloud Shell built into the Azure Portal.
From here, I did az list to show all my Azure subscriptions. I chose the one that holds my Azure Event Hub and copied the associated GUID. Then, I set that account as my default one for the CLI with this command:
az account set -s 11111111-1111-1111-1111-111111111111
With that done, I issued another command to generate the credential file.
az ad sp create-for-rbac --sdk-auth > my.azureauth
I opened up that file within the Cloud Shell, copied the contents, and pasted the JSON content into a new file in the resources directory of my Spring Boot app.
Next up, the code. Because we’re using Spring Cloud Stream, there’s no specific Event Hubs logic in my code itself. I only use Spring Cloud Stream concepts, which abstracts away any boilerplate configuration and setup. The code below shows a simple REST controller that takes in a message, and publishes that message to the output channel. Behind the scenes, when my app starts up, Boot discovers and inflates all the objects needed to securely talk to Azure Event Hubs.
How simple is that? All that’s left is the application properties used by the app. Here, I set a few general Spring Cloud Stream properties, and a few related to the Event Hubs binder.
#point to credentials spring.cloud.azure.credential-file-path=my.azureauth #get these values from the Azure Portal spring.cloud.azure.resource-group=demos spring.cloud.azure.region=East US spring.cloud.azure.eventhub.namespace=seroter-event-hub
#choose where to store checkpoints spring.cloud.azure.eventhub.checkpoint-storage-account=serotereventhubs
#set the name of the Event Hub spring.cloud.stream.bindings.output.destination=seroterhub
#be lazy and let the app create the Storage blobs and Event Hub spring.cloud.azure.auto-create-resources=true
With that, I had a working publisher.
Creating the event subscriber
It’s no fun publishing messages if no one ever reads them. So, I built a subscriber. I walked through the same start.spring.io experience as above, this time ONLY choosing the Cloud Stream dependency. And then added the Event Hubs binder to the pom.xml file of the created project. I also copied the my.azureauth file (containing our credentials) from the publisher project to the subscriber project.
It’s criminally simple to pull messages from a broker using Spring Cloud Stream. Here’s the full extent of the code. Stream handles things like content type transformation, and so much more.
The final step involved defining the application properties, including the Storage account for checkpointing, and whether to automatically create the Azure resources.
#point to credentials spring.cloud.azure.credential-file-path=my.azureauth #get these values from the Azure Portal spring.cloud.azure.resource-group=demos spring.cloud.azure.region=East US spring.cloud.azure.eventhub.namespace=seroter-event-hub
#choose where to store checkpoints spring.cloud.azure.eventhub.checkpoint-storage-account=serotereventhubs
#set the name of the Event Hub spring.cloud.stream.bindings.input.destination=seroterhub #set the consumer group spring.cloud.stream.bindings.input.group=system3
#read from the earliest point in the log; default val is LATEST spring.cloud.stream.eventhub.bindings.input.consumer.start-position=EARLIEST
#be lazy and let the app create the Storage blobs and Event Hub spring.cloud.azure.auto-create-resources=true
And now we have a working subscriber.
Testing this thing
First, I started up the producer app. It started up successfully, and I can see in the startup log that it created the Event Hub automatically for me after connecting.
To be sure, I checked the Azure Portal and saw a new Event Hub with 4 partitions.
Sweet. I called the REST endpoint on my app three times to get a few messages into the Event Hub.
Now remember, since we’re dealing with a log versus a queuing system, my consumers don’t have to be online (or even registered anywhere) to get the data at their leisure. I can attach to the log at any time and start reading it. So that data is just hanging out in Event Hubs until its retention period expires.
I started up my Spring Boot subscriber app. After a couple moments, it connected to Azure Event Hubs, and read the three entries that it hadn’t ever seen before.
Back in the Azure Portal, I checked and saw a new blob container in my Storage account, with a folder for my consumer group, and checkpoints for each partition.
If I sent more messages into the REST endpoint, they immediately appeared in my subscriber app. What if I defined a new consumer group? Would it read all the messages from the beginning?
I stopped the subscriber app, changed the application property for “consumer group” to “system4” and restarted the app. After Spring Cloud Stream connected to each partition, it pumped out whatever it found, and responded immediately to any new entries.
Whether you’re building a change-feed listener off of Cosmos DB, sharing data between business partners, or doing data processing between microservices, you’ll probably be using a broker. If it’s an event bus like Azure Event Hubs, you now have an easy path with Spring Cloud Stream.