Richard Seroter's Architecture Musings

Category: Google Cloud

I don’t enjoy these 7 software development activities. Thanks to generative AI, I might not have to do them anymore.

The world is a better place now that no one pays me to write code. You’re welcome. But I still try to code on a regular basis, and there are a handful of activities I don’t particularly enjoy doing. Most of these relate to the fact that I’m not coding every day, and thus have to waste a lot of time looking up information I’ve forgotten. I’m not alone; the latest StackOverflow developer survey showed that most of us are spending 30 or more minutes a day searching online for answers.

Generative AI may solve those problems. Whether we’re talking AI-assisted development environments—think GitHub Copilot, Tabnine, Replit, or our upcoming Duet AI for Google Cloud—or chat-based solutions, we now have tools to save us from all the endless lookups for answers.

Google Bard has gotten good at coding-related activities, and I wanted to see if it could address some of my least-favorite developer activities. I won’t do anything fancy (e.g. multi-shot prompts, super sophisticated prompts) but will try and write smart prompts that give me back great results on the first try. Once I learn to write better prompts, the results will likely be even better!

#1 – Generate SQL commands

Virtually any app you build accesses a database. I’ve never been great at writing SQL commands, and end up pulling up a reference when I have to craft a JOIN or even a big INSERT statement. Stop judging me.

Here’s my prompt:

I’m a Go developer. Give me a SQL statement that inserts three records into a database named Users. There are columns for userid, username, age, and signupdate.

The result?

What if I want another SQL statement that joins the table with another and counts up the records in the related table? I was impressed that Bard could figure it out based on this subsequent prompt:

How about a join statement between that table and another table called LoginAttempts that has columns for userid, timestamp, and outcome where we want a count of loginattempts per user.

The result? A good SQL statement that seems correct to me. I love that it also gave me an example resultset to consider.

I definitely plan on using Bard for help me with SQL queries from now on.

#2 – Create entity definitions or DTOs

For anything but the simplest apps, I find myself creating classes to represent the objects used by my app. This isn’t hard work, but it’s tedious at times. I’d rather not plow through a series of getters and setters for private member variables, or setup one or more constructors. Let’s see how Bard does.

My prompt:

I’m a Java developer using Spring Boot. Give me a class that defines an object named Employees with fields for employeeid, name, title, location, and managerid.

The result is a complete object, with a “copy” button (so I can paste this right into my IDE), and even source attribution for where the model found the code.

What if I wanted a second constructor that only takes in one parameter? Because the chat is ongoing and supports back-and-forth engagement, I could simply ask “Give me the same class, but with a second constructor that only takes in the employeeid” and get:

This is a time-saver, and I can easily start with this and edit as needed.

#3 – Bootstrap my frontend pages

I like using Bootstrap for my frontend layout, but I don’t use it often enough to remember how to configure it the way I want. Bard to the rescue!

I asked Bard for an HTML page that has a basic Bootstrap style. This is where it’s useful that Bard is internet-connected. It knows the latest version of Bootstrap, whereas other generative chat tools don’t have access to the most recent info.

My prompt is:

Write an HTML page that uses the latest version of Bootstrap to center a heading that says “Feedback Form.” Then include a form with inputs for “subject” and “message” to go with a submit button.

I get back valid HTML and an explanation of what it generated.

It looked right to me, but I wanted to make sure it wasn’t just a hallucination. I took that code, pasted it into a new HTML file, and opened it up. Yup, looks right.

I may not use tools like Bard to generate an entirely complex app, but scaffolding the base of something like a web page is a big time-saver.

#4 – Create declarative definitions for things like Kubernetes deployment YAML

I have a mental block on remembering the exact structure of configuration files. Maybe I should see a doctor. I can read them fine, but I never remember how to write most of them myself. But in the meantime, I’m stoked that generative AI can make it easier to pump out Kubernetes deployment and service YAML, Dockerfiles, Terraform scripts, and most anything else.

Let’s say I want some Kubernetes YAML in my life. I provided this prompt:

Give me Kubernetes deployment and service YAML that deploys two replicas and exposes an HTTP endpoint

What I got back was a valid pile of YAML.

And I do like the results also explain a bit about what’s going. on here, including the actual command to apply these YAML files to a cluster.

Dockerfiles still intimidate me for some reason. I like the idea of describing what I need, and having Bard give me a starter Dockerfile to work with. Full disclosure, I tried getting Bard to generate a valid Dockerfile for a few of my sample GitHub repos, and I couldn’t get a good one.

But this prompt works well:

Show me a Dockerfile for a Go application that has the main Go file in a cmd folder, but also uses other folders named web and config.

The result is a Dockerfile, and it’s well explained in the accompanying text.

Bard also seems pretty good at giving me Terraform, Java property file configurations, and more. As always, verify the results!

#5 – Create sample data

I build a lot of sample apps, which means I need sample data. That might mean input into a REST endpoint, or it could be data for a starter database. Instead of struggling to produce a bunch of fake records, generative AI solutions can give me all the structured data I need.

In the SQL section above, I generated a SQL insert command that I could use for a database. But I can also generate some seed data for my app.

For instance, how about this prompt:

Generate a JavaScript variable holding a JSON array of five baseball players. Pull names from the baseball Hall of Fame. Each record should have an id, name, position, and age.

I get back valid JSON that I could drop in my Node.js app.

I can then ask Bard to convert that to XML, CSV, or any number of formats. I might want more records or fewer records. This also works if you ask for your favorite television characters, musicians from the 1990s, or types of potatoes. Generate data without taxing your brain.

#6 – Explain code I come across in GitHub, blog posts, or StackOverflow

Do you ever find some interesting code in a blog post or StackOverflow answer but can’t exactly figure out how it works? Happens to me all the time. Instead of going further down the rabbit hole to figure out what each line of code does, I can use generative AI to make some sense of it all.

For instance, let’s assume I’m terrible at generating random numbers and stumble upon this StackOverflow answer that’s succinct and without any description. I can ask Bard to describe this code in more depth:

What does this code do? Explain it to me as if I’m new to C#. public string RandomDigits(int length)
{
var random = new Random();
string s = string.Empty;
for (int i = 0; i < length; i++)
s = String.Concat(s, random.Next(10).ToString());
return s;
}

What I get back is a written description of each line. And then, helpfully, a restating of the code with comments added before each statement.

If you’re learning a new language, consider using generative AI to explain code snippets to you. I acknowledge that I’ve had mixed luck pointing Bard at a code file or repo and getting a perfect explanation. I sometimes get hallucinations (e.g. refer to files to methods that don’t exist), but I expect this functionality to get better quickly.

#7 – Convert code from one language (or version) to another

Do you ever come across code that shows a useful pattern, but isn’t in the language you’re coding in? Maybe I found that code above to generate a random number, but want the equivalent in Go. Not a big problem any more. A prompt following the above prompt:

Convert this C# code to Go code instead

I get back Go code, and a description of what’s different.

Consider the case where you find code for calling a particular HTTP endpoint, but it’s in Java and your app is written in Go. My colleague Guillaume just wrote up a great post about calling our new Google PaLM LLM API via a Java app. I can ask for all the code in the post to be converted to Go.

The prompt:

Can you give me the equivalent Go code for the Java code in this blog post?/ https://glaforge.dev/posts/2023/05/30/getting-started-with-the-palm-api-in-the-java-ecosystem/

That’s pretty cool. I wasn’t sure that was going to work.

With all of these examples, it’s still important to verify the results. Generative AI doesn’t absolve you of responsibility as a developer; but it can give you a tremendous head start and save you from wasting tons of time navigating from source to source for answers.

Any other development activities that you’d like generative AI to assist you with?

June 26, 2023
Want to move virtual machines to another cloud? Here are four ways, including a new one.

Moving isn’t fun. At least not for me. Even if you can move from one place to another, there are plenty of things that add friction. In the public cloud, you might want to switch from your first cloud to your next one, but it just feels like a lot of work. And while we cloud vendors like to talk about flashy serverless/container compute options, let’s be honest, most companies have their important workloads running in virtual machines. So how do you move those VMs from one place to another without a ton of effort? I’m going to look at four of the options, including one we just shipped at Google Cloud.

Option #1 – Move the workload, not the VM

In this case, you take what was on the original VM, and install it onto a fresh instance in the next cloud. The VM doesn’t move, the workload does. Maybe you do move the software manually, or re-point your build system to a VM instance in the new cloud.

Why do this? It’s a clean start and might give you the opportunity to do that OS upgrade (or swap) you’ve been putting off. Or you could use this time to split up the websites on a stuff server into multiple servers. This is also the one option that’s mostly guaranteed to work regardless of where you’re coming from, and where you’re going to.

The downside? It’s the most work of any of these options. You’ve got to install software, move state around, reconfigure things. Even if you do automated deployments, there’s likely new work here to bake golden images or deploy to a new cloud.

Option #2 – Export the VM images from one cloud and import into the next one

All the major clouds (and software vendors) support exporting and importing a VM image. These images come in all sorts of formats (e.g. VMDK, VHDX).

Why do this? It gives you a portable artifact that you can bring to another cloud and deploy. It’s a standard approach, and gives you a manageable asset to catalog, secure, backup, and use wherever you want. AWS offers guidance, so does Azure, as does Google Cloud. This usually carries no explicit cost, but brings with it costs for storage of the assets.

The downsides? This too is manual, although can be automated with APIs. It also moves the entire VM image without an opportunity to shrink or modernize any aspect of it. Additionally, it usually requires extra configuration of storage buckets and permissions to store the temporary artifacts.

Option #3 – Convert the VM to a container and move that artifact to the new cloud

Another way to move a VM to another cloud is to extract the VM-based application to a container image. The workload moves, but in a different format. All the major public clouds have something here. Azure Migrate helps with this, AWS provides an App2Container CLI tool, and Google Cloud offers Migrate to Containers as a CLI and UI-based experience.

So, what do you do with those existing .NET Framework apps on Windows? Most advice is "upgrade to #dotnet Core." But I hear that many folks want to freeze on .NET 4.8 for now.

I moved a VM-based .NET Framework app to GKE using our @googlecloud Migrate tooling. Here's my 🧵 … pic.twitter.com/vmZSurwYCd
— Richard Seroter (@rseroter) February 2, 2022

Why do this? This offers a means of “shrinking” the workload by reducing it to its own components, without bringing along the OS with it. This can bring higher workload density in the target cloud (if you throw a bunch of app containers onto consolidated hardware) and reduce cost. Also, this gives you flexibility on where you run the workload next. For instance, the container image you generate from the Google Cloud tool can run on a Kubernetes cluster or serverless Cloud Run environment.

Downsides? This doesn’t work for all workload types. Don’t shove SharePoint into a container, for example. And not all tools work with all the various clouds, so you might have to move the VM manually and then run the containerization tool. Also, doing this may give the impression you’re modernizing the app, but in reality, you’re only modernizing the underlying platform. That is valuable, but doesn’t remove the need for other modernization activities.

Option #4 – Use a managed service that moves the VM and turns down the old instance

Can migration be easier? Can you move VMs around with fewer steps and moving parts? There are definitely solutions for this from a variety of vendors. Among cloud providers, what Google Cloud has is unique. We just added a new experience, and figured we could walk through it together.

First, I built an Amazon EC2 instance and installed a web server onto it. I added a custom tag with the key “type” and value “web-server” so that I could easily find this VM later. I also added two total volumes in order to see if they successfully move alongside the VM itself.

After a few moments, I had my EC2 instance up and running.

Let’s fast forward for a period of time, and maybe it’s time to evolve and pick my next cloud. I chose Google Cloud, WHICH MUST SHOCK YOU. This workload needs a happier home.

The new Migrate to Virtual Machines experience in the Google Cloud console is pretty sweet. From here, I can add migration sources, target projects, create groups of VMs for migration, and monitor the progress.

First, I needed to create a source. We recently added AWS as a built-in option. We’ve supported VMware-based migrations for a while now.

I created the “AWS source” by giving it a name, choosing the source AWS region, the target Google Cloud region, and providing credentials to access my account. Also note that I added an (optional) tag to search for when retrieving instances, and an (optional) tag for the migrated VMs.

My connection was in a “pending” state for a couple of minutes, and after that, showed me a list of VMs that met the criteria (AWS region, tag). Pretty cool.

From here, I chose that VM and picked the option to “add migration.” This added this particular VM into a migration set. Now I could set the “target” details of the VM in Google Cloud Compute Engine that this AWS image loads into. That means the desired machine name, machine type, network, subnet, and such.

I started the migration. Note that I did not have to stop the VM on AWS for this migration to commence.

When it’s done replicating, I don’t yet have a running VM. My last major step is choosing to do a test-clone phase where I test my app before making it “live”, or, jump right to cut-over. In cut-over, the services takes a final data replica, stops the original VM, and makes a Compute Engine instance using the replicated data.

After a few more minutes, I saw a running Google Cloud Compute Engine VM, and a stopped EC2 instance.

I “finalized” the migration to clean up all the temporary data replicas and the like. After not being sure if this migration experience grabbed the secondary disks from my EC2 instance, I confirmed that yes, we brought them all over. Very nice!

Why do this? The Migrate to Virtual Machines experience offers a clean way to move one or multiple VMs from AWS, vSphere, or Azure (preview) to Google Cloud. There’s very little that you have to do yourself. And I like that it handles the shut down of the initial VM, and offers ways to pause and resume the migration.

The downsides? It’s specific to Google Cloud as a target. You’re not using this to move workloads out of Google Cloud. It’s also not yet available in every single Google Cloud region, but will be soon.

What did I miss? How do you prefer to move your VMs or VM-based workloads around?

May 17, 2023
Build event handlers and scale them across regions, all with serverless cloud services? Let’s try it.
Is a serverless architecture realistic for every system? Of course not. But it’s never been easier to build robust solutions out of a bunch of fully-managed cloud services. For instance, what if I want to take uploaded files, inspect them, and route events to app instances hosted in different regions around the world? Such a solution might require a lot of machinery to set up and manage—file store, file listeners, messaging engines, workflow system, hosting infrastructure, and CI/CD products. Yikes. How about we do that with serverless technology such as:
- Google Cloud Run to host app instances in different regions.
- Google Cloud Workflows which figures out which Cloud Run instance should process each event.
- Google Cloud Eventarc to listen to and route “file upload” events to a listener in Cloud Workflows.
- Google Cloud Storage to hold uploaded files.
- Google Cloud Deploy to do parallel deployments of Cloud Run apps to different locations. New feature alert: parallel deployments are brand new.
The final architecture (designed with the free and fun Architecture Diagramming Tool) looks like this:

Let’s build this together, piece by piece.

Step 1: Build Java app that processes CloudEvents

The heart of this system is the app that processes “loan” events. The events produced by Eventarc are in the industry-standard CloudEvents format. Do I want to parse and process those events in code manually? No, no I do not. Two things will help here. First, our excellent engineers have built client libraries for every major language that you can use to process CloudEvents for various Google Cloud services (e.g. Storage, Firestore, Pub/Sub). My colleague Mete took it a step further by creating VS Code templates for serverless event-handlers in Java, .NET, Python, and Node. We’ll use those.

To add these templates to your Visual Studio Code environment, you start with Cloud Code, our Google Cloud extension to popular IDEs. Once Cloud Code is installed, I can click the “Cloud Code” menu and then choose the “New Application” option.

Then I chose the “Custom Application” option and “Import Sample from Repo” and added a link to Mete’s repo.

Now I have the option to pick a “Cloud Storage event” code template for Cloud Functions (traditional function as a service) or Cloud Run (container-based serverless). I picked the Java template for Cloud Run.

The resulting project is a complete Java application. It references the client library mentioned above, which you can see as google-cloudevent-types in the pom.xml file. The code is fairly straightforward and the core operation accepts the inbound CloudEvent and creates a typed StorageObjectData object.
```
@PostMapping("/")
ResponseEntity<Void> handleCloudEvent(@RequestBody CloudEvent cloudEvent) throws InvalidProtocolBufferException {

      // CloudEvent information
      logger.info("Id: " + cloudEvent.getId());
      logger.info("Source: " + cloudEvent.getSource());
      logger.info("Type: " + cloudEvent.getType());

      String json = new String(cloudEvent.getData().toBytes());
      StorageObjectData.Builder builder = StorageObjectData.newBuilder();
      JsonFormat.parser().merge(json, builder);
      StorageObjectData data = builder.build();

      // Storage object data
      logger.info("Name: " + data.getName());
      logger.info("Bucket: " + data.getBucket());
      logger.info("Size: " + data.getSize());
      logger.info("Content type: " + data.getContentType());

      return ResponseEntity.ok().build();
 }
```
This generated project has directions and scripts to test locally, if you’re so inclined. I went ahead and deployed an instance of this app to Cloud Run using this simple command:
```
gcloud run deploy --source .
```
That gave me a running instance, and, a container image I could use in our next step.

Step 2: Create parallel deployment of Java app to multiple Cloud Run locations

In our fictitious scenario, we want an instance of this Java app in three different regions. Let’s imagine that the internal employees in each geography need to work with a local application.

I’d like to take advantage of a new feature of Cloud Deploy, parallel deployments. This makes it possible to deploy the same workload to a set of GKE clusters or Cloud Run environments. Powerful! To be sure, the MOST applicable way to use parallel deployments is a “high availability” scenario where you’d deploy identical instances across locations and put a global load balancer in front of it. Here, I’m using this feature as a way to put copies of an app closer to specific users.

First, I need to create “service” definitions for each Cloud Run environment in my deployment pipeline. I’m being reckless, so let’s just have “dev” and “prod.”

My “dev” service definition looks like this. The “image” name can be anything, as I’ll replace this placeholder in realtime when I deploy the pipeline.
```
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: event-app-dev
spec:
  template:
    spec:
      containers:
      - image: java-eventlistener
```
The “production” YAML service is identical except for a different service name.

Next, I need a Skaffold file that identifies the environments for my pipeline, and points to the respective YAML files that represent each environment.
```
apiVersion: skaffold/v4beta1
kind: Config
metadata: 
  name: deploy-run-webapp
profiles:
- name: dev
  manifests:
    rawYaml:
    - run-dev.yaml
- name: prod
  manifests:
    rawYaml:
    - run-prod.yaml
deploy:
  cloudrun: {}
```
The final artifact I need is a DeliveryPipeline definition. It calls out two stages (dev and prod), and for production that points to a multiTarget that refers to three Cloud Run targets.
```
apiVersion: deploy.cloud.google.com/v1
kind: DeliveryPipeline
metadata:
 name: my-parallel-event-app
description: event application pipeline
serialPipeline:
 stages:
 - targetId: app-dev
   profiles: [dev]
 - targetId: app-prod-multi
   profiles: [prod]
---

apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
 name: app-dev
description: Cloud Run development service
run:
 location: projects/seroter-project-base/locations/us-central1

---

apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
 name: app-prod-multi
description: production
multiTarget:
 targetIds: [prod-east, prod-west, prod-northeast2]
---

apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
 name: prod-east
description: production us-east1
run:
 location: projects/seroter-project-base/locations/us-east1
---

apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
 name: prod-west
description: production us-west1
run:
 location: projects/seroter-project-base/locations/us-west1

---

apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
 name: prod-northeast2
description: production northamerica-northeast2
run:
 location: projects/seroter-project-base/locations/northamerica-northeast2 
```
All set. It takes a single command to create the deployment pipeline.
```
gcloud deploy apply --file=clouddeploy.yaml --region=us-central1 --project=seroter-project-base
```
In the Google Cloud Console, I can see my deployed pipeline with two stages and multiple destinations for production.

Now it’s time to create a release for this deployment and see everything provisioned.

The command to create a release might be included in your CI build process (whether that’s Cloud Build, GitHub Actions, or something else), or you can run the command manually. I’ll do that for this example. I named the release, gave it the name of above pipeline, and swapped the placeholder image name in my service YAML files with a reference to the container image generated by the previously-deployed Cloud Run instance.
```
gcloud deploy releases create test-release-001 \
--project=seroter-project-base \
--region=us-central1 \
--delivery-pipeline=my-parallel-event-app \
--images=java-eventlistener=us-south1-docker.pkg.dev/seroter-project-base/cloud-run-source-deploy/java-cloud-run-storage-event
```
After a few moments, I see a deployment to “dev” rolling out.

When that completed, I “promoted” the release to production and saw a simultaneous deployment to three different cloud regions.

Sweet. Once this is done, I check and see four total Cloud Run instances (one for dev, three for prod) created. I like the simplicity here for shipping the same app instance to any cloud region. For GKE clusters, this also works with Anthos environments, meaning you could deploy to edge, on-prem or other clouds as part of a parallel deploy.

We’re done with this step. I have an event-receiving app deployed around North America.

Step 3: Set up Cloud Storage bucket

This part is simple. I use the Cloud Console to create a new object storage bucket named seroter-loan-applications. We’ll assume that an application drops files into this bucket.

Step 4: Write Cloud Workflow that routes events to correct Cloud Run instance

There are MANY ways one could choose to architect this solution. Maybe you upload files to specific bucket and route directly to the target Cloud Run instance using a trigger. Or you route all bucket uploads to a Cloud Function and decide there where you’ll send it next. Plus dozens of other options. I’m going to use a Cloud Workflow that receives an event, and figures out where to send it next.

A Cloud Workflow is described with a declarative definition written in YAML or JSON. It’s got a standard library of functions, supports control flow, and has adapters to lots of different cloud services. This Workflow needs to parse an incoming CloudEvent and route to one of our three (secured) Cloud Run endpoints. I do a very simple switch statement that looks at the file name of the uploaded file, and routes it accordingly. This is a terrible idea in real life, but go with me here.
```
main:
    params: [eventmsg]
    steps:
        - get-filename:
            assign:
                - filename: ${eventmsg.data.name}
        - choose_endpoint:
            switch:
                - condition: ${text.match_regex(filename, "northeast")}
                  next: forward_request_northeast
                - condition: ${text.match_regex(filename, "uswest")}
                  next: forward_request_uswest
                - condition: ${text.match_regex(filename, "useast")}
                  next: forward_request_useast
        - forward_request_northeast: 
            call: http.post
            args:
                url: https://event-app-prod-ofanvtevaa-pd.a.run.app
                auth:
                    type: OIDC
                headers:
                    Content-Type: "application/json"
                    ce-id: ${eventmsg.id} #"123451234512345"
                    ce-specversion: ${eventmsg.specversion} #"1.0"
                    ce-time: ${eventmsg.time} #"2020-01-02T12:34:56.789Z"
                    ce-type: ${eventmsg.type} #"google.cloud.storage.object.v1.finalized"
                    ce-source: ${eventmsg.source} #"//storage.googleapis.com/projects/_/buckets/MY-BUCKET-NAME"
                    ce-subject: ${eventmsg.subject} #"objects/MY_FILE.txt"
                body:
                    ${eventmsg.data}
            result: the_message
            next: returnval
        - forward_request_uswest: 
            call: http.post
            args:
                url: https://event-app-prod-ofanvtevaa-uw.a.run.app
                auth:
                    type: OIDC
                headers:
                    Content-Type: "application/json"
                    ce-id: ${eventmsg.id} #"123451234512345"
                    ce-specversion: ${eventmsg.specversion} #"1.0"
                    ce-time: ${eventmsg.time} #"2020-01-02T12:34:56.789Z"
                    ce-type: ${eventmsg.type} #"google.cloud.storage.object.v1.finalized"
                    ce-source: ${eventmsg.source} #"//storage.googleapis.com/projects/_/buckets/MY-BUCKET-NAME"
                    ce-subject: ${eventmsg.subject} #"objects/MY_FILE.txt"
                body:
                    ${eventmsg.data}
            result: the_message
            next: returnval
        - forward_request_useast: 
            call: http.post
            args:
                url: https://event-app-prod-ofanvtevaa-ue.a.run.app
                auth:
                    type: OIDC
                headers:
                    Content-Type: "application/json"
                    ce-id: ${eventmsg.id} #"123451234512345"
                    ce-specversion: ${eventmsg.specversion} #"1.0"
                    ce-time: ${eventmsg.time} #"2020-01-02T12:34:56.789Z"
                    ce-type: ${eventmsg.type} #"google.cloud.storage.object.v1.finalized"
                    ce-source: ${eventmsg.source} #"//storage.googleapis.com/projects/_/buckets/MY-BUCKET-NAME"
                    ce-subject: ${eventmsg.subject} #"objects/MY_FILE.txt"
                body:
                    ${eventmsg.data}
            result: the_message
            next: returnval
        - returnval:    
            return: ${the_message}    
```
This YAML results in a workflow that looks like this:

Step 5: Configure Eventarc trigger to kick off a Cloud Workflow

Our last step is to wire up the “file upload” event to this workflow. For that, we use Eventarc. Eventarc handles the machinery for listening to events and routing them. See here that I chose Cloud Storage as my event source (there are dozens and dozens), and then the event I want to listen to. Next I selected my source bucket, and chose a destination. This could be Cloud Run, Cloud Functions, GKE, or Workflows. I chose Workflows and then my specific Workflow that should kick off.

All good. Now I have everything wired up and can see this serverless solution in action.

Step 6: Test and enjoy

Testing this solution is straightforward. I dropped three “loan application” files into the bucket, each named with a different target region.

Sure enough, three Workflows kick off and complete successfully. Clicking into one of them shows the Workflow’s input and output.

Looking at the Cloud Run logs, I see that each instance received an event corresponding to its location.

Wrap Up

No part of this solution required me to stand up hardware, worry about operating systems, or configure networking. Except for storage costs for my bucket objects, there’s no cost to this solution when it’s not running. That’s amazing. As you look to build more event-driven systems, consider stitching together some fully managed services that let you focus on what matters most.
March 27, 2023
Want to learn Google Cloud? Here are seventeen fresh Pluralsight courses that look pretty great.
I’ve been putting together a learning list to help me go deep on machine learning in 2023. After doing a few tutorials reading some docs, and watching a few videos, I jumped onto Pluralsight to see what they had. Given that I’ve been teaching at Pluralsight for more than a decade, I feel silly not checking there first! I bookmarked a few good-looking machine learning courses to watch later, but what really caught my eye was that Google Cloud folks were busy adding and updating tons of courses in the catalog over the past couple of months. Here are a handful of high-quality, educational courses that jumped out at me.

Data and AI/ML
- Innovating with Data and Google Cloud. Ninety-minute course that talks about the value of data, types of cloud data services, and what machine learning is about.
- Google Cloud Big Data and Machine Learning Fundamentals. This course is two-and-a-half hours long, and has many hands-on labs for streaming data processing, working with BigQuery ML, using pre-built and custom ML models, and working with Vertex AI.
- Applying Machine Learning to your Data with Google Cloud. Spend ninety minutes on this excellent overview into machine learning, pre-trained ML APIs, using BigQuery with datasets and ML models, and more. Lots of hands-on exercises here.
- Achieving Advanced Insights with BigQuery. This course is a bit under two hours, and you get to use some advanced SQL, learn about schema design, understand how to optimize BigQuery for performance, and go hands on with BigQuery and Vertex AI.
- Managing Machine Learning Projects with Google Cloud. This four-and-a-half hour course covers ML practices, building and evaluating ML models, identifying real-world use cases, and managing ML projects. It’s got some hands-on lab exercises as well.
- Building Batch Data Pipelines on Google Cloud. Important topics covered in this two-hour-and-twenty minute course. Watch this to learn about EL/ELT/ETL and cloud services like Dataproc, BigQuery, Dataflow, Data Fusion, and more.
- Building Resilient Streaming Analytics Systems on Google Cloud. Related topic to the previous item, but this one goes into stream processing with coverage of services like Pub/Sub, Dataflow, and more. Good material here.
Core infrastructure
- Essential Google Cloud Infrastructure: Core Services. This course is about three-and-a-half hours and covers topics like IAM, storage, databases, and operations. Looks like a very solid overview of the base infrastructure.
- Reliable Google Cloud Infrastructure: Design and Process. This is a four+ hour course that helps you architect and design a system that runs on Google Cloud. It’s not about building out the specific components, but rather, about going from requirements to architecture. it covers topics like microservices, automation, reliability, security, maintenance, and more.
- Google Cloud Fundamentals for AWS Professionals. Its not about where you come from, but where you’re going. If you started with AWS, you’ll like this three hour course that maps your AWS experience to the equivalent (and better!) experience in Google Cloud. IT covers IAM, VMs, storage, containers, deployment tools, Ops, and more.
- Google Cloud Fundamentals for Azure Professionals: Core Infrastructure. Azure’s a great cloud to start in, and when want to up your game, here’s a course that transfers your Azure knowledge to Google Cloud concepts.
- Getting Started with Google Kubernetes Engine. GKE is the best way to use Kubernetes in the cloud today, and this three hour course offers a good intro into the key parts of GKE.
Security
- Managing Security in Google Cloud. This two+ hour course digs into the foundational aspects of security in Google Cloud—think shared responsibilities model, threat mitigation—along with IAM, and VPCs. Some deep stuff here that will be best for architects, security analysts, and engineers.
- Security Best Practices in Google Cloud. Coming in at a shade under two hours, this course has lots of hands-on options to learn about service accounts, confidential VMs, encryption, identity-aware proxy, secret manager, and even GKE security techniques.
Deployments, Day-2, and SRE
- Developing a Google SRE Culture. This is a two hour course for leaders. It talks about DevOps and SRE, SLOs, dealing with regulated workloads, and apply SRE to your own organization.
- Logging, Monitoring and Observability in Google Cloud. An almost five hour course on operations? BRING IT. Watch this one to learn about alerting policies, dashboards, uptime checks, custom metrics, analyzing logs, network security monitoring, managing incidents, and much more. Very comprehensive.
- Develop and Deploy Windows Applications on Google Cloud. This stood out to me because you probably don’t think of Google Cloud as the best home for Windows workloads. You might be surprised. Learn more about how to run Windows apps on Google compute.
I don’t know what your learning journey looks like in 2023, but if you want to go deep on Google Cloud, it might be worth a subscription to Pluralsight for access to this broad catalog.
January 17, 2023
How do you back up and restore your (stateful) Kubernetes workloads? Here’s a new cloud-native option.
The idea of “backup and restore” in a complex distributed system is bit weird. Is it really even possible? Can you snapshot all the components of an entire system at a single point in time, inclusive of all the side effects in downstream systems? I dunno. But you need to at least have a good recovery story for each of your major stateful components! While Kubernetes started out as a terrific orchestrator for stateless containers, it’s also matured as a runtime for stateful workloads. Lots of folks are now using Kubernetes to run databases, event processors, ML models. and even “legacy” apps that maintain local state. Until now, public cloud users have only had DIY or 3rd party options when it comes to backing up their Kubernetes clusters, but not any more. Google Cloud just shipped a new built-in Backup for Google Kubernetes Engine (GKE) feature, and I wanted to try it out.

What Backup for GKE does

Basically, it captures the resources—at the cluster or namespace level—and persistent volumes within a given cluster at a specific point in time. It does not back up cluster configurations themselves (e.g. node pool size, machine types, enabled cluster features). For that, you’d like likely have an infrastructure-as-code approach for stamping out clusters (using something like Terraform), and use Backup for GKE to restore the state of your running app. This diagram from the official docs shows the architecture:

A Kubernetes cluster backup comes from a “backup plan” that defines the scope of a given backup. With these, you choose a cluster to back up, which namespaces you want backed up, and a schedule (if any). To restore a backup into an existing cluster, you execute a pre-defined “restore plan.” All of this is part of a fully managed Google Cloud service, so you’re not stuck operating any of the backup machinery.

Setting up Backup for GKE on a new cluster

Backup for GKE works with existing clusters (see Appendix A below), but I wanted to try it out on a fresh cluster first.

I started with a GKE standard cluster. First, I made sure to choose a Kubernetes version that supported the Backup feature. Right now, that’s Kubernetes 1.24 or higher.

I also turned on two features at the cluster-level. The first was Workload Identity. This security feature enforces more granular, workload-specific permissions to access other Google Cloud services.

The second and final feature to enable is Backup for GKE. This injects the agent into the cluster and connects it to the control plane.

Deploying a stateful app to Kubernetes

Once my cluster was up and running, I wanted to deploy a simple web application to it. What’s the app? I created a poorly-written Go app that has a web form to collect support tickets. After you submit a ticket, I route it to Google Cloud Pub/Sub, write an entry into a directory, and then take the result of the cloud request and jam the identifier into a file on another directory. What does this app prove? Two things. First, it should flex Workload Identity by successfully publishing to Pub/Sub. And second, I wanted to observe how stateful backups worked, so I’m writing files to two directories, one that can be backed by a persistent volume, and one backed by a local (node) volume.

I built and containerized the app automatically by using a Cloud Buildpack within a Cloud Build manifest, and invoking a single command:
```
gcloud builds submit --config cloudbuild.yaml
```
I then logged into my just-created GKE cluster and created a new namespace to hold my application and specific permissions.
```
kubectl create ns demos
```
To light up Workload Identity, you create a local service account in a namespace and map it to an existing Google Cloud IAM account that has the permissions the application should have. I created a Kubernetes service account:
```
kubectl create serviceaccount webapp-sa --namespace demos
```
And then I annotated the service account with the mapping to an IAM account (demo-container-app-user) which triggers the impersonation at runtime:
```
kubectl annotate serviceaccount webapp-sa --namespace demos iam.gke.io/gcp-service-account=demo-container-app-user@seroter-project-base.iam.gserviceaccount.com
```
Sweet. Finally, there’s the Kubernetes deployment YAML that points to my app container, service account, and the two volumes used by my app. At the top is my definition of the persistent volume, and then the deployment itself.
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-output
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: standard-rwo
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-pubsub-publisher-deployment
spec:
  selector:
    matchLabels:
      app: go-pubsub-publisher
  template:
    metadata:
      labels:
        app: go-pubsub-publisher
    spec:
      containers:
      - name: go-pubsub-publisher
        image: gcr.io/seroter-project-base/go-pubsub-publisher:34749b85-afbb-4b59-98cc-4d5d790eb325
        volumeMounts:
          - mountPath: /logs
            name: log-volume
          - mountPath: /acks
            name: pvc-output-volume
        resources:
          requests:
            memory: "64Mi"
            cpu: "300m"
          limits:
            memory: "128Mi"
            cpu: "500m"
        ports:
        - containerPort: 8080
      serviceAccountName: webapp-sa
      securityContext:
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
      volumes:
        - name: log-volume
          emptyDir: {}
        - name: pvc-output-volume
          persistentVolumeClaim:
            claimName: pvc-output
```
I applied the above manifest (and a services definition) to my GKE cluster with the following command:
```
kubectl apply -f k8s/. -n demos
```
A moment afterwards, I saw a deployment and service. The deployment showed two associated volumes, including the auto-created persistent disk based on my declarative request.

Let’s triple check that. I got the name of the pod and got a shell into the running container. See below that both directories show up, and my app isn’t aware of which one is from a persistent volume and which is not.

I pulled up the web page for the app, and entered a few new “support tickets” into the system. The Pub/Sub UI lets me pull messages from a topic subscription, and we see my submitted tickets there.

The next thing to check is the container’s volumes. Sure enough, I saw the contents of each message written to the local directory (/logs) and the message IDs written to the persistent directory (/acks).

Running a backup and restore

Let’s back that thing up.

Backup plans are tied to a cluster. You can see here that my primary cluster (with our deployed app) and new secondary cluster (empty) have zero plans.

I clicked the “create a backup plan” button at the top of this page, and got asked for some initial plan details.

That all seemed straightforward. Then it got real. My next options included the ability to back up ALL the namespaces of the cluster, specific ones, or “protected” (more customized) configs. I just chose our “demos” namespace for backup. Also note that I could choose to back up persistent volume data and control encryption.

Next, I was asked to choose the frequency of backups. This is defined in the form of a CRON expression. I could back up every few minutes, once a month, or every year. If I leave this “schedule” empty, this becomes an on-demand backup plan.

After reviewing all my settings, I saved the backup plan. Then I manually kicked off a backup by providing the name and retention period for the backup.

To do anything with this backup, I need a “restore plan.” I clicked the button to create a new restore plan, and was asked to connect it to a backup plan, and a target cluster.

Next, I had the choice of restoring some, or all, namespaces. In real life, you might back up everything, and then selectively restore. I like that you’re asked about conflict handling, which determines what happens if the target cluster already has the specified namespace in there. There are also a handful of flexible options for restoring volume data, ranging from creating new volumes, to re-using existing, to not restoring anything.

After that, I was asked about cluster-scoped resources. It pre-loaded a few API groups and Object kinds to restore, and offered me the option to overwrite any existing resources.

Finally, I got asked for any substitution rules to swap backed up values for different ones. With that, I finished my restore plan and had everything I needed to test my backup.

I set up a restore, which basically just involved choosing a restore plan (which is connected to a backup, and target cluster). In just a few moments, I saw a “succeeded” message and it looked like it worked.

When I checked out the GKE “workloads” view, I saw both the original and “restored” deployment running.

I logged into the “secondary” GKE cluster and saw my custom namespace and workload. I also checked, and saw that my custom service account (and Workload Identity-ready annotation) came over in the restore action.

Next, I grabbed a shell into the container to check my stateful data. What did I find? The “local” volume from the original container (“logs”) was empty. Which makes sense. That wasn’t backed by a persistent disk. The “acks” directory, on the other hand, was backed up, and shows up intact as part of the restore.

To test out my “restored” app instance, I submitted a new ticket, saw it show up in Pub/Sub (it just worked, as Workload Identity was in place), and also saw the new log file, and updated “ids.txt” file.

Pretty cool! With Backup for GKE, you don’t deal with the installation, patching, or management of your backup infrastructure, and get a fairly sophisticated mechanism for resilience in your distributed system.

To learn more about this, check out the useful documentation, and these two videos: Introduction to Backup for GKE, and How to enable GKE Backup.

Appendix A: Setting up Backup for GKE on an an existing cluster

Backup for GKE doesn’t only work with new clusters. You can add it to most existing GKE clusters. And these clusters can act as either sources or targets!

First, let’s talk about GKE Autopilot clusters. These are basically hyper-automated GKE standard clusters that incorporate all of Google’s security and scalability best practices. An Autopilot cluster doesn’t yet expose “Backup for GKE” feature at creation time, but you apply if after the fact. You also need to ensure you’re on Kubernetes 1.24 or higher. Workload Identity is enabled by default, so there’s nothing you need to do there.

But let’s talk about an existing GKE standard cluster. If you provision one from scratch, the default security option is to use a service account for the node pool identity. What this means is that any workloads in the cluster will have the same permissions as that account.

If I provision a cluster (cluster #1) like so, the app from above does not work. Why? The “default compute service account” doesn’t have permission to write to a Pub/Sub topic. A second security option is to use a specific service account with the minimum set of permissions needed for the node’s workloads. If I provision cluster #2 and choose a service account with rights to publish to Pub/Sub, my app does work.

The third security option relates to the access scopes for the cluster. This is a legacy method for authorization. The default setting is “allow default access” which offers a limited set of OAuth-based permissions. If I build a GKE cluster (cluster #3) with a default service account and “allow full access to all cloud APIs” then my app above does work because it has wide-ranging access to all the cloud APIs.

For a GKE standard cluster configured in either of the three ways above, I cannot install Backup for GKE. Why? I have to first enable Workload Identity. Once I edited the three clusters’ settings to enable Workload Identity, my app behaved the same way (not work, work, work)! That surprised me. I expected it to stop using the cluster credentials and require a Workload Identity assignment. What went wrong? For an existing cluster, turning on Workload Identity alone doesn’t trigger the necessary changes for existing node pools. Any new node pools would have everything enabled, but you have to explicitly turn on the GKE Metadata Server for any existing node pools.

This GKE Metadata Server is automatically turned on for any new node pools when you enable Workload Identity, and if you choose to install Workload Identity on a new cluster, it’s also automatically enabled for the first node pool. I didn’t totally understand all this until I tried out a few scenarios!

Once you’re running a supported version of Kubernetes and have Workload Identity enabled on a cluster, you can enroll it in Backup for GKE.
November 30, 2022
Continuously deploy your apps AND data? Let’s try to use Liquibase for BigQuery changes.
Want to constantly deploy updates to your web app through the use of automation? Not everyone does it, but it’s a mostly solved problem with mature patterns and tools that make it possible. Automated deployments of databases, app services, and data warehouses? Also possible, but not something I personally see done as often. Let’s change that!

Last month, I was tweeting about Liquibase, and their CTO and co-founder pointed out to me that Google Cloud contributed a BigQuery extension. Given that Liquibase is a well-known tool for automating database changes, I figured it was time to dig in and see how it worked, especially for a fully managed data warehouse like BigQuery. Specifically, I wanted to prove out four things:
1. Use the Liquibase CLI locally to add columns to a BigQuery table. This is an easy way to get started!
2. Use the Liquibase Docker image to add columns to a BigQuery table. See how to deploy changes through a Docker container, which makes later automation easier.
3. Use the Liquibase Docker image within Cloud Build to automate deployment of a BigQuery table change. Bring in continuous integration (and general automation service) Google Cloud Build to invoke the Liquibase container to push BigQuery changes.
4. Use Cloud Build and Cloud Deploy to automate the build and deployment of the app to GKE along with a BigQuery table change. This feels like the ideal state, where Cloud Build does app packaging, and then hands off to Cloud Deploy to push BigQuery changes (using the Docker image) and the web app through dev/test/prod.
I learned a lot of new things by performing this exercise! I’ll share all my code and lessons learned about Docker, Kubernetes, init containers, and Liquibase throughout this post.

Scenario #1 – Use Liquibase CLI

The concepts behind Liquibase are fairly straightforward: define a connection string to your data source, and create a configuration file that represents the desired change to your database. A Liquibase-driven change isn’t oriented adding data itself to a database (although, it can), but for making structural changes like adding tables, creating views, and adding foreign key constraints. Liquibase also does things like change tracking, change locks, and assistance with rollbacks.

While it directly integrates with Java platforms like Spring Boot, you can also use it standalone via a CLI or Docker image.

I downloaded the CLI installer for my Mac, which added the bits to a local directory. And then I checked to see if I could access the liquibase CLI from the console.

Next, I downloaded the BigQuery JDBC driver which is what Liquibase uses to connect to my BigQuery. The downloaded package includes the JDBC driver along with a “lib” folder containing a bunch of dependencies.

I added *all* of those files—the GoogleBigQueryJDBC42.jar file and everything in the “lib” folder—to the “lib” folder included in the liquibase install directory.

Next, I grabbed the latest BigQuery extension for Liquibase and installed that single JAR file into the same “lib” folder in the local liquibase directory. That’s it for getting the CLI properly loaded.

What about BigQuery itself? Anything to do there? Not really. When experimenting, I got “dataset not found” from Liquibase when using a specific region like “us-west1” so I created a dataset the wider “US” region and everything worked fine.

I added a simple table to this dataset and started it off with two columns.

Now I was ready to trigger some BigQuery changes! I had a local folder (doesn’t need to be where the CLI was installed) with two files: liquibase.properties, and changelog.yaml. The properties file (details here) includes the database connection string, among other key attributes. I turned on verbose logging, which was very helpful in finding obscure issues with my setup! Also, I want to use environmental credentials (saved locally, or available within a cloud instance by default) versus entering creds in the file, so the OAuthType is set to “3”.
```
#point to where the file is containing the changelog to execute
changelogFile: changelog.yaml
#identify which driver to use for connectivity
driver: com.simba.googlebigquery.jdbc.Driver
#set the connection string for bigquery
url: jdbc:bigquery://https://googleapis.com/bigquery/v2:443;ProjectId=seroter-project-base;DefaultDataset=employee_dataset;OAuthType=3;
#log all the things
logLevel: 0
#if not using the "hub" features
liquibase.hub.mode=off
```
Next I created the actual change log. There are lots of things you can do here, and change files can be authored in JSON, XML, SQL, or YAML. I chose YAML, because I know how to have a good time. The BigQuery driver supports most of the Liquibase commands, and I chose the one to add a new column to my table.
```
databaseChangeLog:
  - changeSet:
      id: addColumn-example1
      author: rseroter
      changes:
        - addColumn:
            tableName: names_1
            columns:
            - column:
                name: location
                type: STRING
```
Once you get all the setup in place, the actual Liquibase stuff is fairly simple! To execute this change, I jumped into the CLI, navigated to the folder holding the properties file and change log, and issued a single command.
```
liquibase --changeLogFile=changelog.yaml update
```
Assuming you have all the authentication and authorization settings correct and files defined and formatted in the right way, the command should complete successfully. In BigQuery, I saw that my table had a new column.

Note that this command is idempotent. I can execute it again and again with no errors or side effects. After I executed the command, I saw two new tables added to my dataset. If I had set the “liquibaseSchemaName” property in the properties file, I could have put these tables into a different dataset of my choosing. What are they for? The DATABASECHANGELOGLOCK table is used to create a “lock” on the database change so that only one process at a time can make updates. The DATABASECHANGELOG table stores details of what was done, when. Be aware that each changeset itself is unique, so if I tried to run a new change (add a different column) with the same changeset id (above, set to “addColumn-example1”), I’d get an error.

That’s it for the CLI example. Not too bad!

Scenario #2 – Use Liquibase Docker image

The CLI is cool, but maybe you want an even more portable way to trigger a database change? Liquibase offers a Docker image that has the CLI and necessary bits loaded up for you.

To test this out, I fired up an instance of the Google Cloud Shell—this is an dev environment that you can access within our Console or standalone. From here, I created a local directory (lq) and added folders for “changelog” and “lib.” I uploaded all the BigQuery JDBC JAR files, as well as the Liquibase BigQuery driver JAR file.

I also uploaded the liquibase.properties file and changelog.yaml file to the “changelog” folder in my Cloud Shell. I opened the changelog.yaml file in the editor, and updated the changeset identifier and set a new column name.

All that’s left is to start the Docker container. Note that you might find it easier to create a new Docker image based on the base Liquibase image with all the extra JAR files embedded within it instead of schlepping the JARs all over the place. In my case here, I wanted to keep it all separate. To ensure that the Liquibase Docker container “sees” all my config files and JAR files, I needed to mount volumes when I started the container. The first volume mount maps from my local “changelog” directory to the “/liquibase/changelog” directory in the container. The second maps from the local “lib” directory to the right spot in the container. And by mounting all those JARs into the container’s “lib” directory—while also setting the “–include-system-classpath” flag to ensure it loads everything it finds there—the container has everything it needs. Here’s the whole Docker command:
```
docker run --rm -v /home/richard/lq/changelog:/liquibase/changelog -v /home/richard/lq/lib:/liquibase/lib liquibase/liquibase --include-system-classpath=true --changeLogFile=changelog/changelog.yaml --defaultsFile=/liquibase/changelog/liquibase.properties update
```
After 30 seconds or so, I saw the new column added to my BigQuery table.

To be honest, this doesn’t feel like it’s that much simpler than just using the CLI, but, by learning how to use the container mechanism, I could now embed this database change process into a container-native cloud build tool.

Scenario #3 – Automate using Cloud Build

Those first two scenarios are helpful for learning how to do declarative changes to your database. Now it’s time to do something more automated and sustainable. In this scenario, I tried using Google Cloud Build to automate the deployment of my database changes.

Cloud Build runs each “step” of the build process in a container. These steps can do all sorts of things, ranging from compiling your code, running tests, pushing to artifact storage, or deploy a workload. Since it can honestly run any container, we could also use the Liquibase container image as a “step” of the build. Let’s see how it works.

My first challenge related to getting all those JDBC and driver JAR files into Cloud Build! How could the Docker container “see” them? To start, I put all the JAR files and config files (updated with a new column named “title”) into Google Cloud Storage buckets. This gave me easy, anywhere access to the files.

Then, I decided to take advantage of Cloud Build’s built-in volume for sharing data between the independent build steps. This way, I could retrieve the files, store them, and then the Liquibase container could see them on the shared volume. In real life, you’d probably grab the config files from a Git repo, and the JAR files from a bucket. We’ll do that in the next scenario! Be aware that there’s also a project out there for mounting Cloud Storage buckets as volumes, but I didn’t feel like trying to do that. Here’s my complete Cloud Build manifest:
```
steps: 
- id: "Get Liquibase Jar files"
  name: 'gcr.io/cloud-builders/gsutil'
  dir: 'lib'
  args: ['cp', 'gs://liquibase-jars/*.jar', '/workspace/lib']
- id: "Get Liquibase config files"
  name: 'gcr.io/cloud-builders/gsutil'
  dir: 'changelog'
  args: ['cp', 'gs://liquibase-configs/*.*', '/workspace/changelog']
- id: "Update BQ"
  name: 'gcr.io/cloud-builders/docker'
  args: [ "run", "--network=cloudbuild", "--rm", "--volume", "/workspace/changelog:/liquibase/changelog", "--volume", "/workspace/lib:/liquibase/lib", "liquibase/liquibase", "--include-system-classpath=true", "--changeLogFile=changelog/changelog.yaml", "--defaultsFile=/liquibase/changelog/liquibase.properties", "update" ]
```
The first “step” uses a container that’s pre-loaded with the Cloud Storage CLI. I executed the “copy” command and put all the JAR files into the built-in “workspace” volume. The second step does something similar by grabbing all the “config” files and dropping them into another folder within the “workspace” volume.

Then the “big” step executed a virtually identical Docker “run” command as in scenario #2. I pointed to the “workspace” directories for the mounted volumes. Note the “–network” flag which is a magic command for using default credentials.

I jumped into the Google Cloud Console and created a new Cloud Build trigger. Since I’m not (yet) using a git repo for configs, but I have to pick SOMETHING when building a trigger, I chose a random repo of mine. I chose an “inline” Cloud Build definition and pasted in the YAML above.

That’s it. I saved the trigger, ensured the “Cloud Build” account had appropriate permissions to update BigQuery, and “ran” the Cloud Build job.

I saw the new column in my BigQuery table as a result and if I looked at the “change table” managed by Liquibase, I saw each of the three change we did so far.

Scenario #4 – Automate using Cloud Build and Cloud Deploy

So far so good. But it doesn’t feel “done” yet. What I really want is to take a web application that writes to BigQuery, and deploy that, along with BigQuery changes, in one automated process. And I want to use the “right” tools, so I should use Cloud Build to package the app, and Google Cloud Deploy to push the app to GKE.

I first built a new web app using Node.js. This very simple app asks you to enter the name of an employee, and it adds that employee to a BigQuery table. I’m seeking seed funding for this app now if you want to invest. The heart of this app’s functionality is in its router:
```
router.post('/', async function(req, res, next) {
    console.log('called post - creating row for ' + req.body.inputname)

    const row = [
        {empid: uuidv4(), fullname: req.body.inputname}
      ];

    // Insert data into a table
    await bigquery
    .dataset('employee_dataset')
    .table('names_1')
    .insert(row);
    console.log(`Inserted 1 rows`);


    res.render('index', { title: 'Employee Entry Form' });
  });
```
Before defining our Cloud Build process that packages the app, I wanted to create all the Cloud Deploy artifacts. These artifacts consist of a set of Kubernetes deployment files, a Skaffold configuration, and finally, a pipeline definition. The Kubernetes deployments get associated to a profile (dev/prod) in the Skaffold file, and the pipeline definition identifies the target GKE clusters.

Let’s look at the Kubernetes deployment file for the “dev” environment. To execute the Liquibase container before deploying my Node.js application, I decided to use Kubernetes init containers. These run (and finish) before the actual container you care about. But I had the same challenge as with Cloud Build. How do I pass the config files and JAR files to the Liquibase container? Fortunately, Kubernetes offers up Volumes as well. Basically, the below deployment file does the following things:
- Create an empty volume called “workspace.”
- Runs an init container that executes a script to create the “changelog” and “lib” folders in the workspace volume. For whatever reason, the Cloud Storage CLI wouldn’t do it automatically for me, so I added this distinct step.
- Runs an init container that git clones the latest config files from my GitHub project (no longer using Cloud Storage) and stashes them in the “changelog” directory in the workspace volume.
- Runs a third init container to retrieve the JAR files from Cloud Storage and stuff them into the “lib” directory in the workspace volume.
- Runs a final init container that mounts each directory to the right place in the container (using subpath references), and runs the “liquibase update” command.
- Runs the application container holding our web app.
```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: db-ci-deployment-dev
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-data-app-dev
  template:
    metadata:
      labels:
        app: web-data-app-dev
    spec:
      volumes:
      - name: workspace
        emptyDir: {}
      initContainers:
        - name: create-folders
          image: alpine
          command:
          - /bin/sh
          - -c
          - |
            cd liquibase
            mkdir changelog
            mkdir lib
            ls
            echo "folders created"
          volumeMounts:
          - name: workspace
            mountPath: /liquibase
            readOnly: false      
        - name: preload-changelog
          image: bitnami/git
          command:
          - /bin/sh
          - -c
          - |
            git clone https://github.com/rseroter/web-data-app.git
            cp web-data-app/db_config/* liquibase/changelog
            cd liquibase/changelog
            ls
          volumeMounts:
          - name: workspace
            mountPath: /liquibase
            readOnly: false
        - name: preload-jars
          image: gcr.io/google.com/cloudsdktool/cloud-sdk
          command: ["gsutil"]
          args: ['cp', 'gs://liquibase-jars/*', '/liquibase/lib/']
          volumeMounts:
          - name: workspace
            mountPath: /liquibase
            readOnly: false
        - name: run-lq
          image: liquibase/liquibase
          command: ["liquibase"]
          args: ['update', '--include-system-classpath=true', '--changeLogFile=/changelog/changelog.yaml', '--defaultsFile=/liquibase/changelog/liquibase.properties']
          volumeMounts:
          - name: workspace
            mountPath: /liquibase/changelog
            subPath: changelog
            readOnly: false
          - name: workspace
            mountPath: /liquibase/lib
            subPath: lib
            readOnly: false
      containers:
      - name: web-data-app-dev
        image: web-data-app
        env:
        - name: PORT
          value: "3000"
        ports:
          - containerPort: 3000
        volumeMounts:
        - name: workspace
          mountPath: /liquibase
```
The only difference between the “dev” and “prod” deployments is that I named the running containers something different. Each deployment also has a corresponding “service.yaml” file that exposes the container with a public endpoint.

Ok, so we have configs. That’s the hard part, and took me the longest to figure out! The rest is straightforward.

I defined a skaffold.yaml file which Cloud Deploy uses to render right assets for each environment.
```
apiVersion: skaffold/v2beta16
kind: Config
metadata:
 name: web-data-app-config
profiles:
 - name: prod
   deploy:
     kubectl:
       manifests:
         - deployment-prod.yaml
         - service-prod.yaml
 - name: dev
   deploy:
     kubectl:
       manifests:
         - deployment-dev.yaml
         - service-dev.yaml
```
Skaffold is a cool tool for local development, but I won’t go into it here. The only other asset we need for Cloud Deploy is the actual pipeline definition! Here, I’m pointing to my two Google Kubernetes Engine clusters (with platform-wide access scopes) that represent dev and prod environments.
```
apiVersion: deploy.cloud.google.com/v1
kind: DeliveryPipeline
metadata:
 name: data-app-pipeline
description: application pipeline for app and BQ changes
serialPipeline:
 stages:
 - targetId: devenv
   profiles:
   - dev
 - targetId: prodenv
   profiles:
   - prod
---

apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
 name: devenv
description: development GKE cluster
gke:
 cluster: projects/seroter-project-base/locations/us-central1-c/clusters/cluster-seroter-gke-1110

---

apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
 name: prodenv
description: production GKE cluster
gke:
 cluster: projects/seroter-project-base/locations/us-central1-c/clusters/cluster-seroter-gke-1117
```
I then ran the single command to deploy that pipeline (which doesn’t yet care about the Skaffold and Kubernetes files):
```
gcloud deploy apply --file=clouddeploy.yaml --region=us-central1 --project=seroter-project-base
```
In the Cloud Console, I saw a visual representation of my jazzy new pipeline.

The last step is to create the Cloud Build definition which builds my Node.js app, stashes it into Google Cloud Artifact Registry, and then triggers a Cloud Deploy “release.” You can see that I point to the Skaffold file, which in turns knows where the latest Kubernetes deployment/service YAML files are at. Note that I use a substitution value here with –images where the “web-data-app” value in each Kubernetes deployment file gets swapped out with the newly generated image identifier.
```
steps:
  - name: 'gcr.io/k8s-skaffold/pack'
    id: Build Node app
    entrypoint: 'pack'
    args: ['build', '--builder=gcr.io/buildpacks/builder', '--publish', 'gcr.io/$PROJECT_ID/web-data-app:$COMMIT_SHA']
  - name: gcr.io/google.com/cloudsdktool/cloud-sdk
    id: Create Cloud Deploy release
    args: 
        [
          "deploy", "releases", "create", "test-release-$SHORT_SHA",
          "--delivery-pipeline", "data-app-pipeline",
          "--region", "us-central1",
          "--images", "web-data-app=gcr.io/$PROJECT_ID/web-data-app:$COMMIT_SHA",
          "--skaffold-file", "deploy_config/skaffold.yaml"
        ]
    entrypoint: gcloud
```
To make all this magic work, I went into Google Cloud Build to set up my new trigger. It points at my GitHub repo and refers to the cloudbuild.yaml file there.

I ran my trigger manually (I could also set it to run on every check-in) to build my app and initiate a release in Cloud Deploy. The first part ran quickly and successfully.

The result? It worked! My “dev” GKE cluster got a new workload and service endpoint, and my BigQuery table got a new column.

When I went back into Cloud Deploy, I “promoted” this release to production and it ran the production-aligned files and popped a workload into the other GKE cluster. And it didn’t make any BigQuery changes, because we already did on the previous run. In reality, you would probably have different BigQuery tables or datasets for each environment!

Wrap up

Did you make it this far? You’re amazing. It might be time to shift from just shipping the easy stuff through automation to shipping ALL the stuff via automation. Software like Liquibase definition gets you further along in the journey, and it’s good to see Google Cloud make it easier.
November 17, 2022
Building a long-running, serverless, event-driven system with as little code as possible
Is code a liability or an asset? What it does should be an asset, of course. But there’s a cost to running and maintaining code. Ideally, we take advantage of (managed) services that minimize how much code we have to write to accomplish something.

What if I want to accept document from a partner or legacy business system, send out a request for internal review of that document, and then continue processing? In ye olden days, I’d build file watchers, maybe a database to hold state of in-progress reviews, a poller that notified reviewers, and a web service endpoint to handle responses and update state in the database. That’s potentially a lot of code. Can we get rid of most that?

Google Cloud Workflows recently added a “callback” functionality which makes it easier to create long-running processes with humans in the middle. Let’s build out an event-driven example with minimal code, featuring Cloud Storage, Eventarc, Cloud Workflows, and Cloud Run.

Step 1 – Configure Cloud Storage

Our system depends on new documents getting added to a storage location. That should initiate the processing. Google Cloud Storage is a good choice for an object store.

I created a new bucket named “loan-application-submissions’ in our us-east4 region. At the moment, the bucket is empty.

Step 2 – Create Cloud Run app

The only code in our system is the application that’s used to review the document and acknowledge it. The app accepts a querystring parameter that includes the “callback URL” that points to the specific Workflow instance waiting for the response.

I built a basic Go app with a simple HTML page, and a couple of server-side handlers. Let’s go through the heart of it. Note that the full code sample is on GitHub.
```
func main() {

	fmt.Println("Started up ...")

	e := echo.New()
	e.Use(middleware.Logger())
	e.Use(middleware.Recover())

	t := &Template{
		Templates: template.Must(template.ParseGlob("web/home.html")),
	}

	e.Renderer = t
	e.GET("/", func(c echo.Context) error {
		//load up object with querystring parameters
		wf := workflowdata{LoanId: c.QueryParam("loanid"), CallbackUrl: c.QueryParam("callbackurl")}

		//passing in the template name (not file name)
		return c.Render(http.StatusOK, "home", wf)
	})

	//respond to POST requests and send message to callback URL
	e.POST("/ack", func(c echo.Context) error {
		loanid := c.FormValue("loanid")
		fmt.Println(loanid)
		callbackurl := c.FormValue("callbackurl")

		fmt.Println("Sending workflow callback to " + callbackurl)

		wf := workflowdata{LoanId: loanid, CallbackUrl: callbackurl}

		// Fetch an OAuth2 access token from the metadata server
		oauthToken, errAuth := metadata.Get("instance/service-accounts/default/token")
		if errAuth != nil {
			fmt.Println(errAuth)
		}

		//load up oauth token
		data := OAuth2TokenInfo{}
		errJson := json.Unmarshal([]byte(oauthToken), &data)
		if errJson != nil {
			fmt.Println(errJson.Error())
		}
		fmt.Printf("OAuth2 token: %s", data.Token)

		//setup callback request
		workflowReq, errWorkflowReq := http.NewRequest("POST", callbackurl, strings.NewReader("{}"))
		if errWorkflowReq != nil {
			fmt.Println(errWorkflowReq.Error())
		}

		//add oauth header
		workflowReq.Header.Add("authorization", "Bearer "+data.Token)
		workflowReq.Header.Add("accept", "application/json")
		workflowReq.Header.Add("content-type", "application/json")

		//inboke callback url
		client := &http.Client{}
		workflowResp, workflowErr := client.Do(workflowReq)

		if workflowErr != nil {

			fmt.Printf("Error making callback request: %s\n", workflowErr)
		}
		fmt.Printf("Status code: %d", workflowResp.StatusCode)

		return c.Render(http.StatusOK, "home", wf)
	})

	//simple startup
	e.Logger.Fatal(e.Start(":8080"))
}
```
The “get” request shows the details that came in via the querystrings. The “post” request generates the required OAuth2 token, adds it to the header, and calls back into Google Cloud Workflows. I got stuck for a while because I was sending an ID token and the service expects an access token. There’s a difference! My colleague Guillaume Laforge, who doesn’t even write Go, put together the code I needed to generate the necessary OAuth2 token.

From a local terminal, I ran a single command to push this source code into our fully managed Cloud Run environment:
```
gcloud run deploy
```
After a few moments, the app deployed and I loaded it up the browser with some dummy querystring values.

Step 3 – Create Workflow with event-driven trigger

That was it for coding! The rest of our system is composed of managed services. Specifically, Cloud Workflows, and Eventarc which processes events in Google Cloud and triggers consumers.

I created a new Workflow called “workflow-loans” and chose the new “Eventarc” trigger. This means that the Workflow starts up as a result of an event happening elsewhere in Google Cloud.

A new panel popped up and asked me to name my trigger and pick a source. We offer nearly every Google Cloud service as a source for events. See here that I chose Cloud Storage. Once I chose the event provider, I’m offered a contextual set of events. I selected the “finalized” event which fires for any new object added to the bucket.

Then, I’m asked to choose my storage bucket, and we have a nice picker interface. No need to manually type it in. Once I chose my bucket, which resides in a different region from my Workflow, I’m told as much.

The final step is to add the Workflow definition itself. These can be in YAML or JSON. My Workflow accepts some arguments (properties of the Cloud Storage doc, including the file name), and runs through a series of steps. It extracts the loan number from file name, creates a callback endpoint, logs the URL, waits for a callback, and processes the response.

The full Workflow definition is below, and also in my GitHub repo.
```
main:
    params: [args]
    steps:
        - setup_variables:
            #define and assign variables for use in the workflow
            assign:
                - version: 100                  #can be numbers
                - filename: ${args.data.name}   #name of doc
        - log_receipt:
            #write a log to share that we started up
            call: sys.log          
            args:
                text: ${"Loan doc received"}
        - extract_loan_number:
            #pull out substring containing loan number
            assign:
                - loan_number : ${text.substring(filename, 5, 8)}
        - create_callback:
            #establish a callback endpoint
            call: events.create_callback_endpoint
            args:
                http_callback_method: "POST"
            result: callback_details
        - print_callback_details:
            #print out formatted URL
            call: sys.log
            args:
                severity: "INFO"
                # update with the URL of your Cloud Run service
                text: ${"Callback URL is https://[INSERT CLOUD RUN URL HERE]?loanid="+ loan_number +"&callbackurl=" + callback_details.url}
        - await_callback:
            #wait impatiently
            call: events.await_callback
            args:
                callback: ${callback_details}
                timeout: 3600
            result: callback_request
        - print_callback_request:
            #wlog the result
            call: sys.log
            args:
                severity: "INFO"
                text: ${"Received " + json.encode_to_string(callback_request.http_request)}
        - return_callback_result:
            return: ${callback_request.http_request}
```
I deployed the Workflow which also generated the Eventarc trigger itself.

Step 4 – Testing it all out

Let’s see if this serverless, event-driven system now works! To start, I dropped a new PDF named “loan600.pdf” into the designated Storage bucket.

Immediately, Eventarc triggered a Workflow instance because that PDF was uploaded to Cloud Storage. See that the Workflow instance in an “await_callback” stage.

On the same page, notice the logs for the Workflow instance, including the URL for my Cloud Run with all the right querystring parameters loaded.

I plugged that URL into my browser and got my app loaded with the right callback URL.

After clicking the “acknowledge loan submission” button which called back to my running Workflow instance, I switched back to Cloud Workflows and saw that my instance completed successfully.

Summary

There are many ways to solve the problem I called out here. I like this solution. By using Google Cloud Eventarc and Workflows, I eliminated a LOT of code. And since all these services, including Cloud Run, are fully managed serverless services, it only costs me money when it does something. When idle, it costs zero. If you follow along and try it for yourself, let me know how it goes!
October 10, 2022
Loading data directly into a warehouse via your messaging engine? Here’s how this handy new feature works in Google Cloud.
First off, I am NOT a data analytics person. My advice is sketchy enough when it comes to app development and distributed systems that I don’t need to overreach into additional areas. That said, we at Google Cloud quietly shipped a new data-related feature this week that sparked my interest, and I figured that we could explore it together.

To be sure, loading data into a data warehouse is a solved problem. Many of us have done this via ETL (extract-transform-load) tools and streaming pipelines for years. It’s all very mature technology, even when steering your data towards newfangled cloud data warehouses like the fully-managed Google Cloud’s BigQuery. Nowadays, app developers can also insert directly into these systems from their code. But what about your event-driven apps? It could be easier than it is today! This is why I liked this new subscription type for Google Cloud Pub/Sub—our messaging engine for routing data between systems—that is explicitly for BigQuery. That’s right, you can directly subscribe your data warehouse to your messaging system.

Let’s try it out, end to end.

First, I needed some data. BigQuery offers an impressive set of public data sets, including those with crime statistics, birth data summaries, GitHub activity, census data, and even baseball statistics. I’m not choosing any of those, because I wanted to learn more about how BigQuery works. So, I built a silly comma-separated file of “pet visits” to my imaginary pet store chain.
```
1,"store400","2022-07-26 06:22:10","Mittens","cat","camp",806
2,"store405","2022-07-26 06:29:15","Jessie","dog","bath",804
3,"store400","2022-07-26 07:01:34","Ellie","dog","nailtrim",880
4,"store407","2022-07-26 07:02:00","Rocket","cat","bath",802
5,"store412","2022-07-26 07:06:45","Frank","cat","bath",853
6,"store400","2022-07-26 08:08:08","Nala","cat","nailtrim",880
7,"store407","2022-07-26 08:15:04","Rocky","dog","camp",890
8,"store402","2022-07-26 08:39:16","Cynthia","bird","spa",857
9,"store400","2022-07-26 08:51:14","Watson","dog","haircut",831
10,"store412","2022-07-26 09:05:58","Manny","dog","camp",818
```
I saved this data as “pets.csv” and uploaded it into a private, regional Google Cloud Storage Bucket.

Excellent. Now I wanted this data loaded into a BigQuery table that I could run queries against. And eventually, load new data into when it flows through Pub/Sub.

I’m starting with no existing data sets or tables in BigQuery. You can see here that all I have is my “project.” And there’s no infrastructure to provision or manage here, so all we have to think about is our data. Amazing.

As an aside, we make it very straightforward to pull in data from all sorts of sources, even those outside of Google Cloud. So, this really can be a single solution for all your data analytics needs. Just sayin’. In this scenario, I wanted to add data to a BigQuery table, so I started by selecting my project and choosing to “create a dataset“, which is really just a container for data tables.

Next, I picked my data set and click the menu option to “create table.” Here’s where it gets fun. I can create an empty table, upload some data or point to object storage repos like Google Cloud Storage, Amazon S3, or Azure Blob Storage. I chose Cloud Storage. Then I located my Storage bucket and chose “CSV” as the file format. Other options include JSON, Avro, and Parquet. Then I gave my table a name (“visits_table”). So far so good.

The last part of this table creation process involves schema definition. BigQuery can autodetect the schema (data types and such), but I wanted to define it manually. The graphical interface offers a way to define column name, data type, and whether it’s a required data point or not.

After creating the table, I could see the schema and run queries against the data. For example, this is a query that returns the count of each animal type coming into my chain of pet stores for service.

You could imagine there might be some geospatial analysis, machine learning models, or other things we constantly do with this data set over time. That said, let’s hook it up to Pub/Sub so that we can push a real-time stream of “visits” from our event-driven architecture.

Before we forget, we need to change permissions to allow Pub/Sub to send data to BigQuery tables. From within Google Cloud IAM, I chose to “include Google-provided role grants” in the list of principals, located my built-in Pub/Sub service account, and added the “BigQuery Data Editor” and “BigQuery Metadata Viewer” roles.

When publishing from Pub/Sub to BigQuery you have a couple of choices for how to handle the data. One option is to dump the entire payload into a single “data” field, which doesn’t sound exciting. The other option is to use a Pub/Sub schema so that the data fields map directly to BigQuery table columns. That’s better. I navigated to the Pub/Sub “Schemas” dashboard and created a new schema.

If kids are following along at home, the full schema looks like this:
```
{
    "type": "record",
    "name": "Avro",
    "fields": [
      {
        "name": "apptid",
        "type": "int"
      },
      {
        "name": "storeid",
        "type": "string"
      },
      {
        "name": "visitstamp",
        "type": "string"
      },
      {
        "name": "petname",
        "type": "string"
      },
      {
        "name": "animaltype",
        "type": "string"
      },
      {
        "name": "servicetype",
        "type": "string"
      },
      {
        "name": "customerid",
        "type": "int"
      }
    ]
  }
```
We’re almost there. Now we just needed to create the actual Pub/Sub topic and subscription. I defined a new topic named “pets-topic”, and selected the box to “use a schema.” Then I chose the schema we created above.

Now for the subscription itself. As you see below, there’s a “delivery type” for “Write to BigQuery” which is super useful. Once I chose that, I was asked for the dataset and table, and I chose the option to “use topic schema” so that the message body would map to the individual columns in the table.

This is still a “regular” Pub/Sub subscription, so if I wanted to, I could set properties like message retention duration, expiration period, subscription filters, and retry policies.

Nothing else to it. And we did it all from the Cloud Console. To test this out, I went to my topic in the Cloud Console, and chose to send a message. Here, I sent a single message that conformed to the topic schema.

Almost immediately, my BigQuery table got updated and I saw the new data in my query results.

When I searched online, I saw various ways that people have stitched together their (cloud) messaging engines with their data warehouse. But from what I can tell, what we did here is the simplest, most-integrated way to pull that off. Try it out and tell me what you think!
July 27, 2022
This might be the cleanest way I’ve seen parallel processing done in a (serverless) workflow engine
I have a love/hate relationship with workflow technologies after 20+ years of working with them. On the plus side, these technologies often provide a good way to quickly model a process without having to code a bunch of boilerplate stuff. And a good workflow runtime handles the complex stuff like starting up a workflow instance, saving state when the workflow is long-running, and enabling things like retries for failed instances. The downside of workflow tech? You’re often wrangling clumsy design interfaces, stitching together multiple components to achieve something you could have solved with a single line of code, and stuck using unique domain-specific languages (DSLs) with zero (or limited) portability. I get why some folks build workflow solutions out of databases and background workers! But maybe there’s something better.

Last year, Google Cloud (where I work) shipped Cloud Workflows. It’s a fully managed design and runtime service where you only pay when a workflow is running. Basically, you’d use Cloud Workflows to run event-based or scheduled processes that coordinate web endpoints (serverless functions, REST APIs) or managed services (databases, AI/ML services). It’s similar to AWS Step Functions, and somewhat like Azure Logic Apps. We just added support for parallel processing, and it’s a very smart implementation. Let’s take a look at examples that you can try out for free.

First, here are five things that I like about Cloud Workflows:
1. Declarative format. Write Workflows in YAML or JSON and get a visual representation. I like this model versus fighting a graphical UI that buries settings or overwhelms the user with millions of options.
2. Event-driven triggers. Cloud Workflows integrates with Eventarc, the eventing subsystem that sends messages based on things happening within Google Cloud (e.g. object loaded into storage bucket, database backup completed).
3. Built-in connectors. A connector makes it easier to talk to Google Cloud services from a Workflow. Set a few properties on a connector without learning the full service API.
4. Production-ready serverless. Cloud Workflows is compliant with a number of certifications and standards. Like Step Functions, Cloud Workflows is a purely managed service that’s pay-as-you-go and scales automatically. Azure Logic Apps offers this approach as well (“consumption plan”), but only recommends it for dev/test. Rather, they encourage single tenancy for their service.
5. A mix of sophisticated and basic operations. Cloud Workflows does some powerful things like automatic type conversion, callbacks for long-running processes, handling errors and retries, and even invoking private endpoints. But it also does things that SHOULD be easy (but aren’t always easy in other workflow engines), like defining and using variables and writing out logs. You can do a lot before you HAVE to jump into code.
One of those sophisticated operations in a workflow is parallel processing. That is, executing more than one blocking call at the same time. This is a sneaky-hard problem to solve in a workflow engine, especially when it comes to shared data. Can both parallel branches see and access the same data? How to avoid collisions? With AWS Step Functions, they’ve gotten around this problem by passing data by value (not reference) into a branch and not allowing state transfers between branches. Cloud Workflows takes a different approach to data in parallel branches. With our implementation, you define “shared variables” that are available to any and all branches that run concurrently. We handle the atomic updates for you and data changes are immediately available to other branches. Pretty cool!

Example #1 – Basic Cloud Workflow

How about we start out with something straightforward. Here, I declared a few variables up front, added a log message, then a switch statement which called one of two steps before sending a result.
```
#basic starter workflow
- step1:
   #define and assign variables for use in the workflow
   assign:
    - var1: 100           #can be numbers
    - var2: "Richard"     #can be text
    - var3: "SAN"
    - var4:               #can be instantiated as null
    - var5: ~             #this is null as well
    - varlist: ["item1"]  #how about a list too?
- step2:
   call: sys.log          #call to standard library to write a log message
   args:
     data: ${var2}
- step3:
   switch:                #control flow example
     - condition: ${var3 == "SAN"}
       next: dostuff1
     - condition: true
       next: dostuff2
   next: final
- dostuff1:
   assign: 
     - var4: "full-time employee"
     - varlist: ${list.concat(varlist, "item2")}  #add item to list
   next: final
- dostuff2:
   assign:
     - var4: "part-time employee"
     - varlist: ${list.concat(varlist, "item3")}  #add item to list
   next: final
- final:
   return: ${varlist}     #the result of the workflow itself
```
Here’s the generated visual representation of this workflow.

After deploying the workflow, I chose the in-console option to execute the workflow. You can see that I have the option to provide input values. And I chose to log all the calls, which makes it easier to trace what’s going on.

When I execute the workflow, I get a live update to the logs, and the output itself. It’s a helpful interface.

Example #2 – Workflow with parallel processing and read-only variables

How about we try a workflow with concurrent branches? It uses the “parallel” control flow action, and I defined a pair of branches that each log a message. Notice that each one can access the “var1” variable. Read-only access to a global variable doesn’t require anything special.
```
# basic parallel steps that work, can read variables without marking as shared
- step1:
   assign:
    - var1: 100
    - var2: 200
- step2:
   call: sys.log
   args:
     data: ${var1}
- step3:
   parallel:              #indicate parallel processing ahead
     branches:
       - branch1:         #first branch which can access var1 declared above
           steps:
             - dostuff:
                call: sys.log
                args:
                  text: ${"log message " + string(var1)}
       - branch2:         #second branch which access the same variable
           steps:
             - dostuff2:
                call: sys.log
                args:
                  data: ${var1}        
- final:
   return: ${var1}
```
The visual representation of parallel branches looks like this.

Example #3 – Workflow with shared, writable variable

The above workflow throws an error if I try to assign a value to that global variable from within a branch. What if I want one or more branches to update a shared variable? It could be a counter, an array, or something else. In this scenario, each branch sets a value. These branches aren’t guaranteed to run in any particular order, so if you run this workflow a few times, you’ll see different final values for “var2.”
```
# writeable variable that must be assigned/declared first before indicated as "shared"
- step1:
   assign:
    - var1: 100
    - var2: 200
- step2:
   parallel:
     shared: [var2]           #variable needs to be declared earlier to be "shared" here
     branches:
       - branch1:
           steps:
             - changenumber:
                assign:       #assign a value to the shared variable
                 - var2: 201
             - dostuff:
                call: sys.log
                args:
                  text: ${"log 1st branch message " + string(var2)}
       - branch2:
           steps:
             - changenumber2:
                assign:       #assign a value to the shared variable
                 - var2: 202
             - dostuff2:
                call: sys.log
                args:
                  text: ${"log 2nd branch message " + string(var2)}      
- final:
   return: ${var2}
```
The workflow looks like this when visualized. You can see the multiple actions per branch.

Example #4 – Workflow with shared array that’s updated within each parallel branch

An array is a type of variable in a Cloud Workflow. What if you wanted to append an item to an array from within each branch of an array? That might be tricky with some engines, but it’s straightforward here.
```
# writeable array
- step1:
   assign:
    - var1: 100
    - var2: 200
    - names: []         #declare array
- step2:
   parallel:
     shared: [names]    #array variable is shared with parallel branches
     branches:
       - branch1:
           steps:
             - addname:
                assign:
                 - names: ${list.concat(names, "Richard")}  #add to the array
             - dostuff:
                call: sys.log
                args:
                  text: ${"log 1st branch message "}
       - branch2:
           steps:
             - addname2:
                assign:
                 - names: ${list.concat(names, "Kris")}     #add to the array
             - dostuff2:
                call: sys.log
                args:
                  text: ${"log 2nd branch message "}      
- final:
   return: ${names}
```
The representation is similar to the one above, but we can see when executing the workflow that the output array has a pair of names added to it.

This is a relatively simple way to append data to a shared object, even when running in a distributed, parallelized workflow.

Example #5 – Workflow with map updated with values from each parallel branch

There’s another way to do this. If you wanted to collect different data points from each branch, and then smash them into a composite object when all the branches complete, it’s not too hard. Here, I scatter and then gather.
```
# separate messages per branch, joined at the end
- step1:
   assign:
    - var1: ~
    - var2: ~
    - var3: {} #declare a map
- step2:
   parallel:
     shared: [var1, var2]  #still need shared variables in order to be writable in a branch
     branches:
       - branch1:
           steps:
             - getval1:
                assign:
                 - var1: "value1"
             - log1:
                call: sys.log
                args:
                  text: ${"log 1st branch message "}
       - branch2:
           steps:
             - getval2:
                assign:
                 - var2: "value2"
             - log2:
                call: sys.log
                args:
                  text: ${"log 2nd branch message "}      
- gathervalues:
   assign:                 #set key/value pairs on the map object
      - var3.val1: ${var1}
      - var3.val2: ${var2}
- final:
   return: ${var3}
```
This could be useful in a few cases, and you can see the visual representation here.

When I call this workflow, I get back a map with a pair of key/value properties.

Example #6 – Workflow with parallel loop that updates a map variable with each iteration

You can do more than just parallel branches in a Cloud Workflow. You can also do parallelized loops. That means that multiple iterations can execute concurrently. Neat!

For this scenario, let’s imagine that I want to pull employee data from three different systems, and return it all as one composite object. To stub this out, I built a Cloud Function that takes in “system ID” via the querystring and returns some fixed data based on which system ID comes in. It’s contrived, but does the job.
```
exports.systemLookup = (req, res) => {

  var systemid = req.query.id;
  var payload;

  switch(systemid) {
    case "e1":
      payload = {name: "Richard Seroter", location: "SAN"};
      break;
    case "d1":
      payload = {department: "PM", tenure: "2.2yrs"};
      break;
    case "r1":
      payload = {latestperfrating: "3"};
      break;
    default:
      payload = {type: "employee"}
      break;
  }
  res.status(200).send(payload);
};
```
After I deployed this function, I built this workflow which loops through a list of employee system names and calls this Function for each one. And then takes the result and adds it to the map variable.
```
# parallel loop that calls a function and builds up a composite object
- step1:
   assign:
    - systemlist: ["e1", "d1", "r1"]   #list of employee systems to retrieve data from
    - employeemap: {}                  #map that holds composite result
- step2:
   parallel:
     shared: [systemlist, employeemap] #still need shared variables in order to be writable in a branch
     for:            #loop
      value: systemid
      in: ${systemlist}
      steps:
         - getEmpDetails:
            call: http.get    #call function
            args:
               url: ${"https://[host]/function-lookupfromworkflow?id=" + systemid}
            result: payload
         - logmsg:
            call: sys.log
            args:
               text: ${"log loop message " + systemid}  
         - append:
            assign:           #assign the result to the map
               - employeemap[systemid]: ${payload.body}
- final:
   return: ${employeemap}
```
Such a workflow is visualized as such.

The result? I get a composite object back, and it happened super fast since the engine made Function calls in parallel!

Summary

Cloud Workflows are new in town, but they’re already a solid option. The DSL is powerful and yet elegantly solves tricky distributed systems problems like shared variables. I think you can run all the examples above within the confines of our free tier, which makes it simple to experiment further. Let me know what you think, and what else you’d like to see us add to Cloud Workflows.
June 28, 2022
Running serverless web, batch, and worker apps with Google Cloud Run and Cloud Spanner
If it seems to you that cloud providers offer distinct compute services for every specific type of workload, you’re not imagining things. Fifteen years ago when I was building an app, my hosting choices included a virtual machine or a physical server. Today? You’ll find services targeting web apps, batch apps, commercial apps, containerized apps, Windows apps, Spring apps, VMware-based apps, and more. It’s a lot. So, it catches my eye when I find a modern cloud service that support a few different types of workloads. Our serverless compute service Google Cloud Run might be the fastest and easiest way to get web apps running in the cloud, and we just added support for background jobs. I figured I’d try out Cloud Run for three distinct scenarios: web app (responds to HTTP requests, scales to zero), job (triggered, runs to completion), and worker (processes background work continuously).

Let’s make this scenario come alive. I want a web interface that takes in “orders” and shows existing orders (via Cloud Run web app). There’s a separate system that prepares orders for delivery and we poll that system occasionally (via Cloud Run job) to update the status of our orders. And when the order itself is delivered, the mobile app used by the delivery-person sends a message to a queue that a worker is constantly listening to (via Cloud Run app). The basic architecture is something like this:

Ok, how about we build it out!

Setting up our Cloud Spanner database

The underlying database for this system is Cloud Spanner. Why? Because it’s awesome and I want to start using it more. Now, I should probably have a services layer sitting in front of the database instead of doing direct read/write, but this is my demo and I’ll architect however I damn well please!

I started by creating a Spanner instance. We’ve recently made it possible to create smaller instances, which means you can get started at less cost, without sacrificing resilience. Regardless of the number of “processing units” I choose, I get 3 replicas and the same availability SLA. The best database in the cloud just got a lot more affordable.

Next, I add a database to this instance. After giving it a name, I choose the “Google Standard SQL” option, but I could have also chosen a PostgreSQL interface. When defining my schema, I like that we offer script templates for actions like “create table”, “create index”, and “create change stream.” Below, you see my table definition.

With that, I have a database. There’s nothing left to do, besides bask in the glory of having a regionally-deployed, highly available relational database instance at my disposal in about 60 seconds.

Creating the web app in Go and deploying to Cloud Run

With the database in place, I can build a web app with read/write capabilities.

This app is written in Go and uses the echo web framework. I defined a basic struct that matches the fields in the database.
```
package model

type Order struct {
	OrderId        int64
	ProductId      int64
	CustomerId     int64
	Quantity       int64
	Status         string
	OrderDate      string
	FulfillmentHub string
}
```
I’m using the Go driver for Spanner and the core of the logic consists of the operations to retrieve Spanner data and create a new record. I need to be smarter about reusing the connection, but I’ll refactor it later. Narrator: He probably won’t refactor it.
```
package web

import (
	"context"
	"log"
	"time"
	"cloud.google.com/go/spanner"
	"github.com/labstack/echo/v4"
	"google.golang.org/api/iterator"
	"seroter.com/serotershop/model"
)

func GetOrders() []*model.Order {

	//create empty slice
	var data []*model.Order

	//set up context and client
	ctx := context.Background()
	db := "projects/seroter-project-base/instances/seroter-spanner/databases/seroterdb"
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		log.Fatal(err)
	}

	defer client.Close()
    //get all the records in the table
	iter := client.Single().Read(ctx, "Orders", spanner.AllKeys(), []string{"OrderId", "ProductId", "CustomerId", "Quantity", "Status", "OrderDate", "FulfillmentHub"})

	defer iter.Stop()

	for {
		row, e := iter.Next()
		if e == iterator.Done {
			break
		}
		if e != nil {
			log.Println(e)
		}

		//create object for each row
		o := new(model.Order)

		//load row into struct that maps to same shape
		rerr := row.ToStruct(o)
		if rerr != nil {
			log.Println(rerr)
		}
		//append to collection
		data = append(data, o)

	}
	return data
}

func AddOrder(c echo.Context) {

	//retrieve values
	orderid := c.FormValue("orderid")
	productid := c.FormValue("productid")
	customerid := c.FormValue("customerid")
	quantity := c.FormValue("quantity")
	status := c.FormValue("status")
	hub := c.FormValue("hub")
	orderdate := time.Now().Format("2006-01-02")

	//set up context and client
	ctx := context.Background()
	db := "projects/seroter-project-base/instances/seroter-spanner/databases/seroterdb"
	client, err := spanner.NewClient(ctx, db)
	if err != nil {
		log.Fatal(err)
	}

	defer client.Close()

	//do database table write
	_, e := client.Apply(ctx, []*spanner.Mutation{
		spanner.Insert("Orders",
			[]string{"OrderId", "ProductId", "CustomerId", "Quantity", "Status", "FulfillmentHub", "OrderDate"},
			[]interface{}{orderid, productid, customerid, quantity, status, hub, orderdate})})

	if e != nil {
		log.Println(e)
	}
}
```
Time to deploy! I’m using Cloud Build to generate a container image without using a Dockerfile. A single command triggers the upload, build, and packaging of my app.
```
gcloud builds submit --pack image=gcr.io/seroter-project-base/seroter-run-web
```
After a moment, I have a container image ready to go. I jumped in the Cloud Run experience and chose to create a new service. After picking the container image I just created, I kept the default autoscaling (minimum of zero instances), concurrency, and CPU allocation settings.

The app started in seconds, and when I call up the URL, I see my application. And I went ahead and submitted a few orders, which then show up in the list.

Checking Cloud Spanner—just to ensure this wasn’t only data sitting client-side—shows that I have rows in my database table.

Ok, my front end web application is running (when requests come in) and successfully talking to my Cloud Spanner database.

Creating the batch processor in .NET and deploying to Cloud Run jobs

As mentioned in the scenario summary, let’s assume we have some shipping system that prepares the order for delivery. Every so often, we want to poll that system for changes, and update the order status in the Spanner database accordingly.

Until lately, you’d run these batch jobs in App Engine, Functions, a GKE pod, or some other compute service that you could trigger on a schedule. But we just previewed Cloud Run jobs which offers a natural choice moving forward. Here, I can run anything that can be containerized, and the workload runs until completion. You might trigger these via Cloud Scheduler, or kick them off manually.

Let’s write a .NET console application that does the work. I’m using the new minimal API that hides a bunch of boilerplate code. All I have is a Program.cs file, and a package dependency on Google.Cloud.Spanner.Data. Because I don’t like you THAT much, I didn’t actually create a stub for the shipping system, and decided to update the status of all the rows at once.
```
using Google.Cloud.Spanner.Data;

Console.WriteLine("Starting job ...");

//connection string
string conn = "Data Source=projects/seroter-project-base/instances/seroter-spanner/databases/seroterdb";

using (var connection = new SpannerConnection(conn)) {

    //command that updates all rows with the initial status
    SpannerCommand cmd = connection.CreateDmlCommand("UPDATE Orders SET Status = 'SHIPPED' WHERE Status = 'SUBMITTED'");

    //execute and hope for the best
    cmd.ExecuteNonQuery();
}

//job should end after this
Console.WriteLine("Update done. Job completed.");
```
Like before, I use a single Cloud Build command to compile and package my app into a container image: gcloud builds submit --pack image=gcr.io/seroter-project-base/seroter-run-job

Let’s go back into the Cloud Run interface, where we just turned on a UI for creating and managing jobs. I start by choosing my just-now-created container image and keeping the “number of tasks” to 1.

For reference, there are other fun “job” settings. I can allocate up to 32GB of memory and 8 vCPUs. I can set the timeout (up to an hour), choose how much parallelism I want, and even select the option to run the job right away.

After creating the job, I click the button that says “execute” and run my job. I see job status and application logs, updated live. My job succeeded!

Checking Cloud Spanner confirms that my all table rows were updated to a status of “SHIPPED”.

It’s great that I didn’t have to leave the Cloud Run API or interface to build this batch processor. Super convenient!

Creating the queue listener in Spring and deploying to Cloud Run

The final piece of our architecture requires a queue listener. When our delivery drivers drop off a package, their system sends a message to Google Cloud Pub/Sub, our pretty remarkable messaging system. To be sure, I could trigger Cloud Run (or Cloud Functions) automatically whenever a message hits Pub/Sub. That’s a built-in capability. I don’t need to use a processor that directly pulls from the queue.

But maybe I want to control the pull from the queue. I could do stateful processing over a series of messages, or pull batches instead of one-at-a-time. Here, I’m going to use Spring Cloud Stream which talks to any major messaging system and triggers a function whenever a message arrives.

Also note that Cloud Run doesn’t explicitly support this worker pattern, but you can make it work fairly easily. I’ll show you.

I went to start.spring.io and configured my app by choosing a Spring Web and GCP Support dependency. Why “web” if this is a background worker? Cloud Run still expects a workload that binds to a web port, so we’ll embed a web server that’s never used.

After generating the project and opening it, I deleted the “GCP support” dependency (I just wanted an auto-generated dependency management value) and added a couple of POM dependencies that my app needs. The first is the Google Cloud Pub/Sub “binder” for Spring Cloud Stream, and the second is the JDBC driver for Cloud Spanner.
```
<dependency>
	<groupId>org.springframework.cloud</groupId>
	<artifactId>spring-cloud-gcp-pubsub-stream-binder</artifactId>
	<version>1.2.8.RELEASE</version>
</dependency>
<dependency>
	<groupId>com.google.cloud</groupId>
	<artifactId>google-cloud-spanner-jdbc</artifactId>
</dependency>
```
I then created an object definition for “Order” with the necessary fields and getters/setters. Let’s review the primary class that does all the work. The way Spring Cloud Stream works is that reactive functions annotated as beans are invoked when a message comes in. The Spring machinery wires up the connection to the message broker and does most of the work. In this case, when I get an order message, I update the order status in Cloud Spanner to “DELIVERED.”
```
package com.seroter.runworker;


import java.util.function.Consumer;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import reactor.core.publisher.Flux;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.sql.SQLException;

@SpringBootApplication
public class RunWorkerApplication {

	public static void main(String[] args) {
		SpringApplication.run(RunWorkerApplication.class, args);
	}

	//takes in a Flux (stream) of orders
	@Bean
	public Consumer<Flux<Order>> reactiveReadOrders() {

		//connection to my database
		String connectionUrl = "jdbc:cloudspanner:/projects/seroter-project-base/instances/seroter-spanner/databases/seroterdb";
		
		return value -> 
			value.subscribe(v -> { 
				try (Connection c = DriverManager.getConnection(connectionUrl); Statement statement = c.createStatement()) {
					String command = "UPDATE Orders SET Status = 'DELIVERED' WHERE OrderId = " + v.getOrderId().toString();
					statement.executeUpdate(command);
				} catch (SQLException e) {
					System.out.println(e.toString());
				}
			});
	}
}
```
My corresponding properties file has the few values Spring Cloud Stream needs to know about. Specifically, I’m specifying the Pub/Sub topic, indicating that I can take in batches of data, and setting the “group” which corresponds to the topic subscription. What’s cool is that if these topics and subscriptions don’t exist already, Spring Cloud Stream creates them for me.
```
server.port=8080
spring.cloud.stream.bindings.reactiveReadOrders-in-0.destination=ordertopic
spring.cloud.stream.bindings.reactiveReadOrders-in-0.consumer.batch-mode=true
spring.cloud.stream.bindings.reactiveReadOrders-in-0.content-type=application/json
spring.cloud.stream.bindings.reactiveReadOrders-in-0.group=orderGroup
```
For the final time, I run the Cloud Build command to build and package my Java app into a container image: gcloud builds submit --pack image=gcr.io/seroter-project-base/seroter-run-worker

With this container image ready to go, I slide back to the Cloud Run UI and create a new service instance. This time, after choosing my image, I choose “always allocated CPU” to ensure that the CPU stays on the whole time. And I picked a minimum instance of one so that I have a single always-on worker pulling from Pub/Sub. I also chose “internal only” traffic and require authentication to make this harder for someone to randomly invoke.

My service quickly starts up, and upon initialization, creates both the topic and queue for my app.

I go into the Pub/Sub UI where I can send a message directly into a topic. All I need to send in is a JSON payload that holds the order ID of the record to update.

The result? My database record is updated, and I see this by viewing my web application and noticing the second row has a new “status” value.

Wrap up

Instead of using two or three distinct cloud compute services to satisfy this architecture, I used one. Cloud Run defies your expectations of what serverless can be, especially now that you can run serverless jobs or even continuously-running apps. In all cases, I have no infrastructure to provision, scale, or manage.

You can use Cloud Run, Pub/Sub, and Cloud Build with our generous free tier, and Spanner has never been cheaper to try out. Give it a whirl, and tell me what you think of Cloud Run jobs.
June 9, 2022