Category: Google Cloud

Did we just make platform engineering much easier by shipping a cloud IDP?

I’m a believer in platform engineering as a concept. Bringing standardization and golden paths to developers so that they can ship software quickly and safely sounds awesome. And it is. But it’s also been a slog to land it. Measurement has been inconsistent, devs are wildly unhappy with the state of self-service, and the tech landscape is disjointed with tons of tools and a high cost of integration. Smart teams are finding success, but this should be easier. Maybe now it is.

Last week at Google Cloud Next ’25, we took the wraps off the concept of a Cloud Internal Developer Platform (IDP). Could we take the best parts of platform engineering—consistent config management, infrastructure orchestration, environment management, deployment services, and role-based access—and deliver them as a vertically-integrated experience? Can we shift down instead of putting so much responsibility on the developer? I think we can. We have to! Our goal at Google Cloud is to deliver a Cloud IDP that is complete, integrated, and application-centric. The cloud has typically been a pile of infrastructure services, loosely organized through tags or other flawed grouping mechanisms. We’re long overdue for an app-centric lens on the cloud.

Enough talking. Let me show you by walking through an end-to-end platform engineering scenario. I want to design and deploy an application using architecture templates, organize the deployed artifacts into an “application”, troubleshoot an issue, and then get visibility into the overall health of the application.

Design and deploy app architectures with Application Design Center

To make life difficult IDP also stands for “internal developer portal.” That’s not confusing at all. Such a portal can serve as the front-door for a dev team that’s interacting with the platform. Application Design Center (ADC) is now in public preview, and offers functionality for creating templates, storing templates in catalogs, sharing templates, and deploying instances of templates.

I can start with an existing ADC template or create a brand new one. Or, I can use the ever-present Cloud Assist chat to describe my desired architecture in natural language, iterate on it, and then create an ADC template from that. Super cool!

For the rest of this example, I’ll use an existing app template in ADC. This one consists of many different components. Notice that I’ve got Cloud Run (serverless) components, virtual machines, storage buckets, secrets, load balancers, and more. Kubernetes coming soon!

I can add to this architecture by dropping and configuring new assets onto the canvas. I can also use natural language! From the Cloud Assist chat, I asked to “add a cache to the movie-frontend service” and you can see that I got a Redis cache added. And the option to accept or reject the suggestion.

Worried that you’re just working in a graphical design surface? Everything on the canvas is represented as Terraform. Switching from “Design” to “Code” at the top reveals the clean Terraform generated by ADC. Use our managed Terraform service or whatever you want for your infrastructure orchestration workflow with Terraform.

When I’m done with the template and want to instantiate my architecture, I can turn this into a deployed app. Google Cloud takes care of all the provisioning, and the assets are held together in an application grouping.

ADC is powerful for a few reasons. It works across different runtimes and isn’t just a Kubernetes solution. ADC offers good template cataloging and sharing capabilities. Its support for natural language is going to be very useful. And its direct integration with other parts of the platform engineering journey is important. Let’s see that now.

Organize apps with App Hub

An “app” represents many components, as we just saw. They might even span “projects” in your cloud account. And an application should have clearly identified owners and criticality. Google Cloud App Hub is generally available, and acts as a real-time registry of resources and applications.

App Hub auto-discovers resources in your projects (a couple dozen types so far, many more to come) and lets you automatically (via ADC) or manually group them into applications.

For a given app, I can see key metadata like its criticality and environment. I can also see who the development, business, and operations owners are. And of course, I can see a list of all the resources that make up this application.

Instead of this being a static registry, App Hub maintains links to the physical resources you’ve deployed. Once I have an application, then what?

Observe app-centric metrics in Cloud Monitoring

It’s not easy to see how apps or app-related components are performing. Now it is. We just enabled the preview of Application Monitoring in our Cloud Monitoring service.

From here, I can a list of all my App Hub apps, and the component performance of each.

When I drill into the “web server” resource, I get some terrific metrics and logs, all within whatever timeframe I specify. This is a high-density view, and I like the data points we surface here.

Again, we’re seeing a smart, integrated set of technologies here versus a series of independent stack pieces that aren’t deeply aware of the other.

Resolve issues using Cloud Assist Investigations

In that dashboard above, I’m seeing that container restarts are a real issue in this application. It’s time to troubleshoot!

Within this dashboard, I see embedded logs, and notice a warming about back-off restarting with my pods. I don’t love reading piles of JSON to try and figure out the problem, nor can I see all the ancillary content just by looking at this log entry. In private preview we have this new Investigate button.

Clicking that button sparks a new Investigation. These are AI-fueled evaluations based on a given error, and a host of related application data points. It’s meant to be a holistic exploration.

Heres where all that shared context is so valuable. In under a minute, I see the details of the Investigation. These details show the issue itself and then a series of “relevant observations.” An Investigation can be edited and re-run, downloaded, and more.

Most importantly, there’s a “Hypothesis” section that helps the app owner or SRE pinpoint the problem area to focus on. These seem well-described with clear recommendations.

I’m confident that this will be a supremely useful tool for those trying to quickly resolve application issues.

Manage the overall health of the application in Cloud Hub

What’s your “home page” for the applications you manage? That’s the idea behind the preview of the Cloud Hub. It offers app owners a starting point for the management, health, and optimization of the apps their care about.

I might start each day looking at any platform-wide incidents impacting my app, any deployment issues, service health, and more.

One private preview feature I’ll show you here is the “Optimization” view. I’m getting app-level cost and utilization summaries! It’s easy to view this for different time periods, and even drill into on a specific product within the app. What a useful view for identifying the actual cost of a running application in dev, test, or prod.

Summary

While platform engineering has been around a while, and cloud computing even longer, neither has been easy for people who just want to build and run apps. Google Cloud is uniquely set up to make this better, and this new Cloud IDP experience might be an important step forward. Try out some of the components yourself!

April 16, 2025
These are 25 “sure bet” breakout sessions for developers at Google Cloud Next ’25

It can feel like a relief to attend a single-track conference. No choices to make! Just sit back and learn without feeling angst about choosing the “right” talk to listen to. Of course, big conferences have their own value as you have nearly endless options for what you want to learn about. But it’s easy to feel overwhelmed. I helped assemble the breakout program for Google Cloud Next ’25 (April 9th through 11th) and every talk is great. Some will be more interesting to developers than others, and these are the ones I’m confident you’d really enjoy.

Specifically, these are talks where you’ve got speakers I can vouch for, fresh insights about software development, and in most cases, live demonstrations. And, a handful of talks—I asked the Google DeepMind team to fill up a mini-track for us—feature speakers you don’t always get to hear from at Cloud Next.

[cloud] The ultimate Cloud Run guide: From zero to production. Cloud Run is amazing, and these two speakers are guaranteed to do it justice.

[engineering] Unlock developer productivity with AI. How does Google do it? I asked these folks to come show us how Google engineering transformed with AI.

[cloud] Gen AI with Gemini and LangChain4j for the enterprise Java developer. Both of these speakers have deep Java expertise, and Guillaume is contributing a lot to the LangChain4j project.

[Android] Accelerate enterprise Android app development with Gemini in Android Studio. AI assistance is making life easier for mobile devs. You’ll like what this team has to show off.

[DeepMind] Let’s talk about thinking models: Test time compute and reasoning. You won’t want to miss AI research scientist Jack Rae explaining our thinking on thinking models.

[DeepMind] Gemini is now a multimodal creative partner. This will be a can’t miss session with Mostafa showing us the “emerging frontier of multimodal generation.”

[DeepMind] Agents are all you need. Shrestha is going to make us all smarter on the latest advances in AI agents.

[DeepMind] Long context is all you need. I doubt most have effectively unlocked long context, and I expect Nikolay to fix that.

[DeepMind] DeepMind: Replicating the entire computer vision industry using Gemini. Vision use cases are hot right now, and you’ll hear from DeepMind about what’s possible.

[cloud] Take your multimodal data to gen AI with databases and serverless runtimes. Abi is an excellent technologist and is great at demoing live. This talk has a good mix of technologies.

[cloud] Remote development in 2025: Your new default. Great topic, and Jason is going to have have a good time with this one.

[cloud] Architectural approaches for RAG infrastructure. You’ll learn a lot by listening to these two explain different ways to approach retrieval augmented generation.

[cloud] From AWS to Google Cloud: Expand your cloud toolkit. We’ve got folks who are very familiar with other clouds, and can help you map the ideas and services to Google Cloud.

[cloud] Reimagining the IDP: How cloud changes your developer platform. Anna is a brilliant engineer and has an exciting take on internal developer platforms. I’m joining her to help deliver this fun talk, and show off some demos.

[Firebase] Deploy Next.js and Angular apps with Firebase App Hosting on Google Cloud. Deep expertise plus a penchant for live demos? This should be great.

[Firebase] Introducing Firebase Genkit: A new framework for AI development. This is an underrated AI app framework that comes with a lot of extras. These speakers will get you fired up about it.

[cloud] Developer’s jumpstart guide to Vertex AI. Zack and Dave are going to be hands-on for most of this, which should help you really understand what’s possible.

[cloud] Accelerate end-to-end Java application development with Gemini Code Assist. Nobody just uses AI coding assistance. They’re bringing language skills to the tools. These speakers will help you get the most out of your Java work.

[cloud] Build high-performance apps for the edge with Wasm in Go 1.24. Curious about Web Assembly? These are some smart folks who can help you figure out where it applies to edge architectures.

[cloud] Build AI agents on Cloud Run. The co-founder of LangChain joins our Cloud Run experts to look at how to build great agents.

[cloud] Full-stack improv: From idea to app in 45 minutes. This will be a blast. Luke and Simon will pick a framework and database, and then take suggestions for what to build live.

[cloud] Deploy AlloyDB Omni on Kubernetes next to local AI models. Every app needs a database, and Gleb will give you confidence that you can run PostgreSQL in Kubernetes.

[cloud] Ditch the frameworks and embrace core tech: Prototyping in the AI era. Karl is a terrific engineer and is definitely going to deliver an engaging and educational talk.

[engineering] Navigate a web app with an AI-enabled browser agent. Our own technical writing team built an innovative agent to help us improve the ongoing quality of our docs. You’ll enjoy learning about this.

[cloud] What’s new in Gemini Code Assist. There will be many exciting updates, and these great speakers will undoubtably show it off.

There are dozens of other terrific developer-focused talks at Next. Build your program for the conference, or watch out for the recorded version of most of these to show up later.

March 24, 2025
From AI-assisted creation to smart test plans, I like all the recent updates to this cloud integration service

I’m approaching twenty-five years of connecting systems together. Yikes. In the summer of 2000, I met a new product called BizTalk Server that included a visual design tool for building workflows. In the years following, that particular toolset got better (see image), and a host of other cloud-based point-and-click services emerged. Cloud integration platforms are solid now, but fairly stagnant. I haven’t noticed a ton of improvements over the past twelve months. That said, Google Cloud’s Application Integration service is improving (and catching up) month over month, and I wanted to try out the latest and greatest capabilities. I think you’ll see something you like.

Could you use code (and AI-generated code) to create all your app integrations instead of using visual modeling tools like this? Probably. But you’d see scope creep. You’d have to recreate system connectors (e.g. Salesforce, Stripe, databases, Google Sheets), data transformation logic, event triggers, and a fault-tolerant runtime for async runners. You might find yourself creating a fairly massive system to replace one you can use as-a-service. So what’s new with Google Cloud Application Integration?

Project setup improvements

Let’s first look at templates. These are pre-baked blueprints that you can use to start a new project. Google now offers a handful of built-in templates, and you can see custom ones shared with you by others.

I like that anyone can define a new template from an existing integration, as I show here.

Once I create a template, it shows up under “project templates” along with a visual preview of the integration, the option to edit, share or download as JSON, and any related templates.

The next new feature of Google Cloud Application Integration related to setup is the Gemini assistance. This is woven into a few different features—I’ll show another later—including the ability to create new integrations with natural language.

After clicking that button, I’m asked to provide a natural language description of the integration I want to create. There’s a subset of triggers and tasks recognized here. See here that I’m asking for a message to be read from Pub/Sub, approvals sent, and a serverless function called if the approval is provided.

I’m shown the resulting integration, and iterate in place as much as I want. Once I land on the desired integration, I accept the Gemini-created configuration and start working with the resulting workflow.

This feels like a very useful AI feature that helps folks learn the platform, and set up integrations.

New design and development features

Let’s look at new features for doing the core design and development of integrations.

First up, there’s a new experience for seeing and editing configuration variables. What are these? Think of config variable as settings for the integration itself that you can set at deploy time. It might be something like a connection string or desired log level.

Here’s another great use of AI assistance. The do-whatever-you-want JavaScript task in an integration can now be created with Gemini. Instead of writing the JavaScript yourself, use Gemini to craft it.

I’m provided a prompt and asked for updated JavaScript to also log the ID of the employee record. I’m then shown a diff view that I can confirm, or continue editing.

As you move data between applications or systems, you likely need to switch up structure and format. I’ve long been jealous of the nice experience in Azure Logic Apps, and now our mapping experience is finally catching up.

The Data Transformer task now has a visual mapping tool for the Jsonnet templates. This provides a drag-and-drop experience between data structures.

Is the mapping not as easy as one to one? No problem. There are now transformation operations for messing with arrays, performing JSON operations, manipulating strings, and much more.

I’m sure your integrations NEVER fail, but for everyone else, it’s useful to know have advanced failure policies for rich error handling strategies. For a given task, I can set up one or more failure policies that tell the integration what to do when an issue occurs? Quit? Retry? Ignore it? I like the choices I have available.

There’s a lot to like the authoring experience, but these recent updates make it even better.

Fresh testing capabilities

Testing? Who wants to test anything? Not me, but that’s because I’m not a good software engineer.

We shipped a couple of interesting features for those who want to test their integrations.

First, it’s a small thing, but when you have an API Trigger kicking off your integration—which means that someone invokes it via web request—we now make it easy to see the associated OpenAPI spec. This makes it easier to understand a service, and even consume it from external testing tools.

Once I choose to “view OpenAPI spec“, I get a slide-out pane with the specification, along with options to copy or download the details.

But by far, the biggest addition to the Application Integration toolchain for testers is the ability to create and run test plans. Add one or more test cases to an integration, and apply some sophisticated configurations to a test.

When I choose that option, I’m first asked to name the test case and optionally provide a description. Then, I enter “test mode” and set up test configurations for the given components in the integration. For instance, here I’ve chosen the initial API trigger. I can see the properties of the trigger, and then set a test input value.

A “task” in the integration has more test case configuration options. When I choose the JavaScript task, I see that I can choose a mocking strategy. Do you play it straight with the data coming in, purposely trigger a skip or failure, or manipulate the output?

Then I add one or more “assertions” for the test case. I can check whether the step succeeded or failed, if a variable equals what I think it should, or if a variable meets a specific condition.

Once I have a set of test cases, the service makes it easy to list them, duplicate them, download them, and manage them. But I want to run them.

Even if you don’t use test cases, you can run a test. In that case, you click the “Test” button and provide an input value. If you’re using test cases, you stay in (or enter) “test case mode” and then the “Test” button runs your test cases.

Very nice. There’s a lot you can do here to create integrations that exist in a typical Ci/CD environment.

Better “day 2” management

This final category looks at operational features for integrations.

This first feature shipped a few days ago. Now we’re offering more detailed execution logs that you can also download as JSON. A complaint with systems like this is that they’re a black box and you can’t tell what’s going on. The more transparency, the better. Lots of log details now!

Another new operational feature is the ability replay an integration. Maybe something failed downstream and you want to retry the whole process. Or something transient happened and you need a fresh run. No problem. Now I can pick any completed (or failed) integration and run it again.

When I use this, I’m asked for a reason to replay. And what I liked is that after the replay occurs, there’s an annotation indicating that this given execution was the result of a replay.

Also be aware that you can now cancel an execution. This is hand for long-running instances that may no longer matter.

Summary

You don’t need to use tools like this, of course. You can connect your systems together with code or scripts. But I personally like managed experiences like this that handle the machinery of connecting to systems, transforming data, and dealing with running the dozens or thousands of hourly events between systems.

If you’re hunting for a solution here, give Google Cloud Application Integration a good look.

February 24, 2025
What does a modern, AI-assisted developer workflow built around Google Gemini look like? Let’s explore.
Software is never going to be the same. Why would we go back to laborious research efforts, wasting time writing boilerplate code, and accepting so many interruptions to our flow state? Hard pass. It might not happen for you tomorrow, next month, or next year, but AI will absolutely improve your developer workflow.

Your AI-powered workflow may make use of more than one LLMs. Go for it. But we’ve done a good job of putting Gemini into nearly every stage of the new way of working. Let’s look at what you can do RIGHT NOW to build with Gemini.
Build knowledge, plans, and prototypes with Gemini

Are you still starting your learning efforts with a Google search? Amateurs 🙂 I mean, keep doing those so that we earn ad dollars. But you’ve got so many new ways to augment a basic search.

Gemini Deep Research is pretty amazing. Part of Gemini Advanced, it takes your query, searches the web on your behalf, and gives you a summary in minutes. Here I asked for help understanding the landscape of PostgreSQL providers, and it recapped results found in 240+ relevant websites from vendors, Reddit, analyst, and more.

Gemini Deep Research creating a report about the PostgreSQL landscape

You’ve probably heard of NotebookLM. Built with Gemini 2.0, it takes all sorts of digital content and helps you make sense of it. Including those hyper-realistic podcasts (“Audio Overviews”).

Planning your work or starting to flesh out a prototype? For free, Google AI Studio lets you interact with the latest Gemini models. Generate text, audio, or images from prompts. Produce complex codebases based on reference images or text prompts. Share your desktop and get live assistance on whatever task you’re doing. It’s pretty rad.

Google AI Studio’s Live API makes it possible to interact live with the model

Google Cloud customers can get knowledge from Gemini in a few ways. The chat for Gemini Cloud Assist gives me an ever-present agent that can help answer questions or help me explore options. Here, I asked for a summary of the options for running PostgreSQL in Google Cloud. It breaks the response down by fully-managed, self-managed, and options for migration.

Chat for Gemini Code Assist teaches me about PostgreSQL options

Gemini for Google Cloud blends AI-assistance into many different services. One way to use this is to understand existing SQL scripts, workflows, APIs, and more.

Gemini in BigQuery explains an existing query and helps me learn about it

Trying to plan out your next bit of work? Google AI Studio or Vertex AI Studio can assist here too. In either service, you can pass in your backlog of features and bugs, maybe an architecture diagram or two, and even some reference PDFs, and ask for help planning out the next sprint. Pretty good!

Vertex AI Studio “thinking” through a sprint plan based on multi-modal input

Build apps and agents with Gemini

We can use Google AI Studio or Vertex AI Studio to learn things and craft plans, but now let’s look at how you’d actually build apps with Gemini.

You can work with the raw Gemini API. There are SDK libraries for Python, Node, Go, Dart, Swift, and Android. If you’re working with Gemini 2.0 and beyond, there’s a new unified SDK that works with both the Developer API and Enterprise API (Vertex). It’s fairly easy to use. I wrote a Google Cloud Function that uses the unified Gemini API to generate dinner recipes for whatever ingredients you pass in.
```
package function

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"net/http"
	"os"

	"github.com/GoogleCloudPlatform/functions-framework-go/functions"
	"google.golang.org/genai"
)

func init() {
	functions.HTTP("GenerateRecipe", generateRecipe)
}

func generateRecipe(w http.ResponseWriter, r *http.Request) {
	ctx := context.Background()
	ingredients := r.URL.Query().Get("ingredients")

	if ingredients == "" {
		http.Error(w, "Please provide ingredients in the query string, like this: ?ingredients=pork, cheese, tortilla", http.StatusBadRequest)
		return
	}

	projectID := os.Getenv("PROJECT_ID")
	if projectID == "" {
		projectID = "default" // Provide a default, but encourage configuration
	}

	location := os.Getenv("LOCATION")
	if location == "" {
		location = "us-central1" // Provide a default, but encourage configuration
	}

	client, err := genai.NewClient(ctx, &genai.ClientConfig{
		Project:  projectID,
		Location: location,
		Backend:  genai.BackendVertexAI,
	})
	//add error check for err
	if err != nil {
		log.Printf("error creating client: %v", err)
		http.Error(w, "Failed to create Gemini client", http.StatusInternalServerError)
		return
	}

	prompt := fmt.Sprintf("Given these ingredients: %s, generate a recipe.", ingredients)
	result, err := client.Models.GenerateContent(ctx, "gemini-2.0-flash-exp", genai.Text(prompt), nil)
	if err != nil {
		log.Printf("error generating content: %v", err)
		http.Error(w, "Failed to generate recipe", http.StatusServiceUnavailable)
		return
	}

	if len(result.Candidates) == 0 {
		http.Error(w, "No recipes found", http.StatusNotFound) // Or another appropriate status
		return
	}

	recipe := result.Candidates[0].Content.Parts[0].Text // Extract the generated recipe text

	response, err := json.Marshal(map[string]string{"recipe": recipe})
	if err != nil {
		log.Printf("error marshalling response: %v", err)
		http.Error(w, "Failed to format response", http.StatusInternalServerError)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	w.Write(response)
}
```
There are a lot agent frameworks out there right now. A LOT. Many of them have good Gemini support. You can build agents with Gemini using LangChain, LangChain4J, LlamaIndex, Spring AI, Firebase Genkit, and the Vercel AI SDK.

What coding tools can I use with Gemini? GitHub Copilot now supports Gemini models. Folks who love Cursor can choose Gemini as their underlying model. Same goes for fans of Sourcegraph Cody. Gemini Code Assist from Google Cloud puts AI-assisted tools into Visual Studio Code and the JetBrains IDEs. Get the power of Gemini’s long context, personalization on your own codebase, and now the use of tools to pull data from Atlassian, GitHub, and more. Use Gemini Code Assist within your local IDE, or in hosted environments like Cloud Workstations or Cloud Shell Editor.

Gemini Code Assist brings AI assistance to your dev workspace, including the use of tools

Project IDX is another Google-provided dev experience for building with Gemini. Use it for free, and build AI apps, with AI tools. It’s pretty great for frontend or backend apps.

Project IDX lets you build AI apps with AI tools

Maybe you’re building apps and agents with Gemini through low-code or declarative tools? There’s the Vertex AI Agent Builder. This Google Cloud services makes it fairly simple to create search agents, conversational agents, recommendation agents, and more. No coding needed!

Conversational agents in the Vertex AI Agent Builder

Another options for building with Gemini is the declarative Cloud Workflows service. I built a workflow that calls Gemini through Vertex AI and summarizes any provided document.
```
# Summarize a doc with Gemini
main:
  params: [args]
  steps:
  - init:
      assign:
        - doc_url: ${args.doc_url}
        - project_id: ${args.project_id}
        - location: ${args.location}
        - model: ${args.model_name}
        - desired_tone: ${args.desired_tone}
        - instructions: 
  - set_instructions:
      switch:
        - condition: ${desired_tone == ""}
          assign:
            - instructions: "Deliver a professional summary with simple language."
          next: call_gemini
        - condition: ${desired_tone == "terse"}
          assign:
            - instructions: "Deliver a short professional summary with the fewest words necessary."
          next: call_gemini
        - condition: ${desired_tone == "excited"}
          assign:
            - instructions: "Deliver a complete, enthusiastic summary of the document."
          next: call_gemini
  - call_gemini:
      call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent
      args:
        model: ${"projects/" + project_id + "/locations/" + location + "/publishers/google/models/" + model}
        region: ${location}
        body:
          contents:
            role: user
            parts:
              - text: "summarize this document"
              - fileData: 
                  fileUri: ${doc_url}
                  mimeType: "application/pdf"
          systemInstruction: 
            role: user
            parts:
              - text: ${instructions}
          generation_config:  # optional
            temperature: 0.2
            maxOutputTokens: 2000
            topK: 10
            topP: 0.9
      result: gemini_response
  - returnStep:
      return: ${gemini_response.candidates[0].content.parts[0].text}
```
Similarly, its sophisticated big-brother, Application Integration, can also interact with Gemini through drag-and-drop integration workflows. These sorts of workflow tools help you bake Gemini predictions into all sorts of existing processes.

Google Cloud Application Integration calls Gemini models

After you build apps and agents, you need a place to host them! In Google Cloud, you’ve could run in a virtual machine (GCE), Kubernetes cluster (GKE), or serverless runtime (Cloud Run). There’s also the powerful Firebase App Hosting for these AI apps.

There are also two other services to consider. For RAG apps, we now offer the Vertex AI RAG Engine. I like this because you get a fully managed experience for ingesting docs, storing in a vector database, and performing retrieval. Doing LangChain? LangChain on Vertex AI offers a handy managed environment for running agents and calling tools.

Build AI and data systems with Gemini

In addition to building straight-up agents or apps, you might build backend data or AI systems with Gemini.

If you’re doing streaming analytics or real-time ETL with Dataflow, you can build ML pipelines, generate embeddings, and even invoke Gemini endpoints for inference. Maybe you’re doing data analytics with frameworks like Apache Spark, Hadoop, or Apache Flink. Dataproc is a great service that you can use within Vertex AI, or to run all sorts of data workflows. I’m fairly sure you know what Colab is, as millions of folks per month use it for building notebooks. Colab and Colab Enterprise offer two great ways to build data solutions with Gemini.

Let’s talk about building with Gemini inside your database. From Google Cloud SQL, Cloud Spanner, and AlloyDB, you can create “remote models” that let you interact with Gemini from within your SQL queries. Very cool and useful. BigQuery also makes it possible to work directly with Gemini from my SQL query. Let me show you.

I made a dataset from the public “release notes” dataset from Google Cloud. Then I made a reference to the Gemini 2.0 Flash model, and then asked Gemini for a summary of all a product’s release notes from the past month.
```
-- create the remote model
CREATE OR REPLACE MODEL
`[project].public_dataset.gemini_2_flash`
REMOTE WITH CONNECTION `projects/[project]/locations/us/connections/gemini-connection`
OPTIONS (ENDPOINT = 'gemini-2.0-flash-exp');

-- query an aggregation of responses to get a monthly product summary
SELECT * 
FROM
 ML.GENERATE_TEXT(
    MODEL `[project].public_dataset.gemini_2_flash`,
    (
      SELECT CONCAT('Summarize this month of product announcements by rolling up the key info', monthly_summary) AS prompt
      FROM (
        SELECT STRING_AGG(description, '; ') AS monthly_summary
        FROM `bigquery-public-data`.`google_cloud_release_notes`.`release_notes` 
        WHERE product_name = 'AlloyDB' AND DATE(published_at) BETWEEN '2024-12-01' AND '2024-12-31'
      )
    ),
    STRUCT(
      .05 AS TEMPERATURE,
      TRUE AS flatten_json_output)
    )
```
How wild is that? Love it.

You can also build with Gemini in Looker. Build reports, visualizations, and use natural language to explore data. See here for more.

And of course, Vertex AI helps you build with Gemini. Build prompts, fine-tune models, manage experiments, make batch predictions, and lots more. If you’re working with AI models like Gemini, you should give Vertex AI a look.

Build a better day-2 experience with Gemini

It’s not just about building software with Gemini. The AI-driven product workflow extends to post-release activities.

Have to set up least-privilege permissions for service accounts? Build the right permission profile with Gemini.

The “Help me choose roles” feature uses Gemini to figure out the right permissions

Something goes wrong. You need to get back to good. You can build faster resolution plans with Gemini. Google Cloud Logging supports log summarization with Gemini.

Google Cloud Logging supports log summarization with Gemini

Ideally, you know when something goes wrong before your customers notice. Synthetic monitors are one way to solve that. We made it easy to build synthetic monitors with Gemini using natural language.

“Help me code” option for creating synthetic monitors in Cloud Monitoring

You don’t want to face security issues on day-2, but it happens. Gemini is part of Security Command Center where you can build search queries and summarize cases.

Gemini can also help you build billing reports. I like this experience where I can use natural language to get answers about my spend in Cloud Billing.

Gemini in Cloud Billing makes it easier to understand your spend

Build supporting digital assets with Gemini

The developer workflow isn’t just about code artifacts. Sometimes you create supporting assets for design docs, production runbooks, team presentations, and more.

Use the Gemini app (or our other AI surfaces) to generate images. I do this all the time now!

Image for use in a presentation is generated by Gemini

Building slides? Writing docs? Creating spreadsheets? Gemini for Workspace gives you some help here. I use this on occasion to refine text, generate slides or images, and update tables.

Gemini in Google Docs helps me write documents

Maybe you’re getting bored with static image representations and want some more videos in your life? Veo 2 is frankly remarkable and might be a new tool for your presentation toolbox. Consider a case where you’re building a mobile app that helps people share cars. Maybe produce a quick video to embed in the design pitch.

Veo 2 generating videos for use in a developer’s design pitch

AI disrupts the traditional product development workflow. Good! Gemini is part of each stage of the new workflow, and it’s only going to get better. Consider introducing one or many of these experiences to your own way of working in 2025.
January 13, 2025
8 ways AI will change how I work in 2025

You don’t have to use generative AI. It’s possible to avoid it and continue doing whatever you’ve been doing, the way you’ve been doing it. I don’t believe that sentence will be true in twelve months. Not because you’ll have to use it—although in some cases it may be unavoidable—but because you’ll want to use it. I thought about how my work will change next year.

#1. I’ll start most efforts by asking “can AI help with this?”

Do I need to understand a new market or product area? Analyze a pile of data? Schedule a complex series of meetings? Quickly generate a sample app for a customer demo? Review a blog post a teammate wrote? In most cases, AI can give me an assist. I want to change my mental model to first figure out if there’s a smarter (Ai-assisted) way to do something.

That said, it’s about “can AI help me” versus “can AI do all my work.” I don’t want to end up in this situation.

https://twitter.com/gregisenberg/status/1869202002783207622

#2. I’m going to do much better research.

Whether planning a strategy or a vacation, there’s a lot of time spent researching. That’s ok, as you often uncover intriguing new tangents while exploring the internet.

AI can still improve the process. A lot. I find myself using the Gemini app, Google AI Studio, and NotebookLM to understand complex ideas. Gemini Deep Research is almost unbelievable. Give it a prompt, it scours the web for dozens or hundreds of sources, and then compiles a report.

https://twitter.com/glenngabe/status/1867192814112543131

What an amazing way to start or validate research efforts. Have an existing pile of content—might be annual reports, whitepapers, design docs, or academic material—that you need to make sense of? NotebookLM is pretty amazing, and should change how all of us ask questions of research material.

https://twitter.com/itsolelehmann/status/1869099449357852831

#3. I will learn new things faster.

Many of us have jobs where we need to quickly get up to speed on a topic. I want help in context, so that I stay in a flowstate.

Back to NotebookLM, I might use this to get easier-to-digest audio overviews of complex new ideas.

https://twitter.com/omarsar0/status/1846958979227718023

And then with coding assistance tools, I also am getting more and more comfortable staying in my IDE to get help on things I don’t yet know. Here, my Gemini Code Assist extension is helping me learn how to fix my poorly-secured Java code.

Finally, I’m quite intrigued by how the new Gemini 2.0 Multimodal Live API will help me in the moment. By sharing my screen with the model, I can get realtime help into whatever I’m struggling with. Wow.

https://twitter.com/mckaywrigley/status/1866930933842186427

#4. I’ll less time debating and more time coding.

My day job is to lead a sizable team at Google Cloud and help everyone do their best work. I still like to code, though!

it’s already happening, but next year I expect to code more than in years past. Why? Because AI is making easier and more fun. Whether using an IDE assistant, or a completely different type of IDE like Cursor, it’s never been simpler to build legit software. We all can go from idea to reality so quickly now.

https://twitter.com/dnaijatechguy/status/1866797564747727252

Stop endlessly debating ideas, and just test them out quickly! Using lowcode platforms or AI assisted coding tools, you can get working prototypes in no time.

#5. I will ask better questions.

I’ve slowly learned that the best leaders simply ask better questions. AI can help us a few ways here. First, there are “thinking” models that show you a chain of thought that might inspire your own questions.

https://twitter.com/JeffDean/status/1869789813232341267

LLMs are awesome at giving answers, but they’re also pretty great at crafting questions. Look at this. I uploaded a set of (fake) product bugs and asked the Gemini model to help me come up with clarifying questions to ask the engineers. Good list!

And how about this. Google Cloud BigQuery has an excellent feature called Data Insights which generates a bunch of candidate questions for a given dataset (here, the Google Cloud Release Notes). What a great way to get some smart, starter questions to consider!

#6. I want to identify where the manual struggle is actually the point.

I don’t want AI to do everything for me. There are cases where the human struggle is where the enjoyment comes from. Learning how to do something. Fumbling with techniques. Building up knowledge or strength. I don’t want a shortcut. I want deep learning.

I’m going to keep doing my daily reading list by hand. No automation allowed, as it forces me to really get a deeper grasp on what’s going on in our industry. I’m not using AI to write newsletters, as I want to keep working on the writing craft myself.

This mass integration of AI into services and experiences is great. It also forces us to stop and decide where we intentionally want to avoid it!

#7. I should create certain types of content much faster.

There’s no excuse to labor over document templates or images in presentations anymore. No more scouring the web for the perfect picture.

I use Gemini in Google Slides all the time now. This is the way I add visuals to presentations and it saves me hours of time.

https://twitter.com/GoogleWorkspace/status/1821260772351037893

Generate code, docs, and images, sure. We’ve seen that, but the image generation tech is getting tremendous.

https://twitter.com/emollick/status/1868715099793338552

But videos too? I’m only starting to consider how to use remarkable technology like Veo 2. I’m using it now, and it’s blowing my mind. It’ll likely impact what I produce next year.

https://twitter.com/GoogleDeepMind/status/1868703624714395907

#8. I’m going to free up some valuable time.

That’s what most of this is all about. I don’t want to do less work; I want to do better work. Even with all this AI and automation, I expect I’ll be working the same number of hours next year. But I’ll be happier with how I’m spending those hours: learning, talking to humans, investing in others. Less time writing boilerplate code, breaking flow state to get answers, or even executing mindlessly repetitive tasks in the browser.

https://twitter.com/labsdotgoogle/status/1866874399724540016

I don’t work for AI; AI works for me. And in 2025, I’m expecting to make it work hard!

December 19, 2024
Customizing AI coding suggestions using the *best* code, not just *my* code
The ability to use your own codebase to customize the suggestions from an AI coding assist is a big deal. This feature—available in products like Gemini Code Assist, GitHub Copilot, and Tabnine—gives developers coding standards, data objects, error messages, and method signatures that they recognize from previous projects. Data shows that the acceptance rate for AI coding assistants goes way up when devs get back trusted results that look familiar. But I don’t just want up-to-date and familiar code that *I* wrote. How can I make sure my AI coding assistant gives me the freshest and best code possible? I used code customization in Gemini Code Assist to reference Google Cloud’s official code sample repos and now I get AI suggestions that feature the latest Cloud service updates and best practices for my preferred programming languages. Let me show you how I did it.

Last month, I showed how to use local codebase awareness in Gemini Code Assist (along with its 128,000 input token window) to “train” the model on the fly using code samples or docs that an LLM hasn’t been trained on yet. It’s a cool pattern, but also requires upfront understanding of what problem you want to solve, and work to stash examples into your code repo. Can I skip both steps?

Yes, Gemini Code Assist Enterprise is now available and I can point to existing code repos in GitHub or GitLab. When I reference a code repo, Google Cloud automatically crawls it, chunks it up, and stores it (encrypted) in a vector database within a dedicated project in my Google Cloud environment. Then, the Gemini Code Assist plugin uses that data as part of a RAG pattern when I ask for coding suggestions. By pointing at Google Cloud’s code sample repos—any best practice repo would apply here—I supercharge my recommendations with data the base LLM doesn’t have (or prioritize).

Step #0 – Prerequisites and considerations

Code customization is an “enterprise” feature of Gemini Code Assist, so it requires a subscription to that tier of service. There’s a promotional $19-per-month price until March of 2025, so tell your boss to get moving.

Also, this is currently available in US, European, and Asian regions, you may need to request geature access via a form (depending on when you read this), and today it works with GitHub.com and GitLab.com repos, although on-premises indexing is forthcoming. Good? Good. Let’s keep going.

Step #1 – Create the source repo

One wrinkle here is that you need to own the repos you ask Gemini Code Assist to index. You can’t just point at any random repo to index. Deal breaker? Nope.

I can just fork an existing repo into my own account! For example, here’s the Go samples repo from Google Cloud, and the Java one. Each one is stuffed with hundreds of coding examples for interacting with most of Google Cloud’s services. These repos are updated multiple times per week to ensure they include support for all the latest Cloud service features.

I went ahead and forked each repo in GitHub. You can do it via the CLI or in the web console.

I didn’t overthink it and kept the repository name the same.

Gemini Code Assist can index up to 950 repos (and more if really needed), so you could liberally refer to best-practice repos that will help your developers write better code.

Any time I want to refresh my fork to grab the latest code sample updates, I can do so.

Step #2 – Add a reference to the source repo

Now I needed to reference these repos for later code customization. Google Cloud Developer Connect is a service that maintains connections to source code sitting outside Google Cloud.

I started by choosing GitHub.com as my source code environment.

Then I named my Developer Connect connection.

Then I installed a GitHub app into my GitHub account. This app is what enables the loading of source data into the customization service. From here, I chose the specific repos that I wanted available to Developer Connect.

When finished, I had one of my own repos, and two best practice repos all added to Developer Connect.

That’s it! Now to point these linked repos to Gemini Code Assist.

Step #3 – Add a Gemini Code Assist customization index

I had just two CLI commands to execute.

First, I created a code customization index. You’ve got one index per Cloud project (although you can request more) and you create it with one command.

Next, I created a repository group for the index. You use these to control access to repos, and could have different ones for different dev audiences. Here’s where you actually point to a given repo that has the Developer Connect app installed.

I ran this command a few times to ensure that each of my three repos was added to the repository group (and index).

Indexing can take up to 24 hours, so here’s where you wait. After a day, I saw that all my target repos were successfully indexed.

Whenever I sync the fork with the latest updates to code samples, Gemini Code Assist will index the updated code automatically. And my IDE with Gemini Code Assist will have the freshest suggestions from our samples repo!

Step #4 – Use updated coding suggestions

Let’s prove that this worked.

I looked for a recent commit to the Go samples repos that the base Gemini Code Assist LLM wouldn’t know about yet. Here’s one that has new topic-creation parameters for our Managed Kafka service. I gave the prompt below to Gemini Code Assist. First, I used a project and account that was NOT tied to the code customization index.
```
//function to create a topic in Google Cloud Managed Kafka and include parameters for setting replicationfactor and partitioncount
```
The coding suggestion was good, but incomplete as it was missing the extra configs the service can now accept.

When I went to my Code Assist environment that did have code customization turned on, you see that the same prompt gave me a result that mirrored the latest Go sample code.

I tried a handful of Java and Go prompts, and I regularly (admittedly, not always) got back exactly what I wanted. Good prompt engineering might have helped me reach 100%, but I still appreciated the big increase in quality results. It was amazing to have hundreds of up-to-date Google-tested code samples to enrich my AI-provided suggestions!

AI coding assistants that offer code customization from your own repos are a difference maker. But don’t stop at your own code. Index other great code repos that represent the coding standards and fresh content your developers need!
November 18, 2024
I love this technique for getting up-to-date suggestions from my AI coding assistant
Trust. Without trust, AI coding assistants won’t become a default tool in a developer’s toolbox. Trust is the #1 concern of devs today, and it’s something I’ve struggled with in regards to getting the most relevant answers from an LLM. Specifically, am I getting back the latest information? Probably not, given that LLMs have a training cutoff date. Your AI coding assistant probably doesn’t (yet) know about Python 3.13, the most recent features of your favorite cloud service, or the newest architectural idea shared at a conference last week. What can you do about that?

To me, this challenge comes up in at least three circumstances. There are entirely new concepts or tools that the LLM training wouldn’t know about. Think something like pipe syntax as an alternative to SQL syntax. I wouldn’t expect a model trained last year to know about that. How about updated features to existing libraries or frameworks? I want suggestions that reflect the full feature set of the current technology and I don’t want to accidentally do something the hard (old) way. An example? Consider the new “enum type” structured output I can get from LangChain4J. I’d want to use that now! And finally, I think about improved or replicated framework libraries. If I’m upgrading from Java 8 to Java 23, or Deno 1 to Deno 2, I want to ensure I’m not using deprecated features. My AI tools probably don’t know about any of these.

I see four options for trusting the freshness of responses from your AI assistant. The final technique was brand new to me, and I think it’s excellent.
1. Fine-tune your model
2. Use retrieval augmented generation (RAG)
3. Ground the results with trusted sources
4. “Train” on the fly with input context
Let’s briefly look at the first three, and see some detailed examples of the fourth.

Fine-tune your model

Whether using commercial or open models, they all represent a point-in-time based on their training period. You could choose to repeatedly train your preferred model with fresh info about the programming languages, frameworks, services, and patterns you care about.

The upside? You can get a model with knowledge about whatever you need to trust it. The downside? It’s a lot of work—you’d need to craft a healthy number of examples and must regularly tune the model. That could be expensive, and the result wouldn’t naturally plug into most AI coding assistance tools. You’d have to jump out of your preferred coding tool to ask questions of a model elsewhere.

Use RAG

Instead of tuning a serving a custom model, you could choose to augment the input with pre-processed content. You’ll get back better, more contextual results when taking into account data that reflects the ideal state.

The upside? You’ll find this pattern increasingly supported in commercial AI assistants. This keeps you in your flow without having to jump out to another interface. GitHub Copilot offers this, and now our Gemini Code Assist provides code customization based on repos in GitHub or GitLab. With Code Assist, we handle the creation and management of the code index of your repos, and you don’t have to manually chunk and store your code. The downside? This only works well if you’ve got the most up-to-date data in an indexed source repo. If you’ve got old code or patterns in there, that won’t help your freshness problem. And while these solutions are good for extra code context, they may not support a wider range of possible context sources (e.g. text files).

Ground the results

This approach gives you more confidence that the results are accurate. For example, Google Cloud’s Vertex AI offers “ground with Google Search” so that responses are matched to real, live Google Search results.

If I ask a question about upgrading an old bit of Deno code, you can see that the results are now annotated with reference points. This gives me confidence to some extent, but doesn’t necessarily guarantee that I’m getting the freshest answers. Also, this is outside of my preferred tool, so it again takes me out of a flow state.

Train on the fly

Here’s the approach I just learned about from my boss’s boss, Keith Ballinger. I complained about freshness of results from AI assistance tools, and he said “why don’t you just train it on the fly?” Specifically, pass the latest and greatest reference data into a request within the AI assistance tool. Mind … blown.

How might it handle entirely new concepts or tools? Let’s use that pipe syntax example. In my code, I want to use this fresh syntax instead of classic SQL. But there’s no way my Gemini Code Assist environment knows about that (yet). Sure enough, I just get back a regular SQL statement.

But now, Gemini Code Assist supports local codebase awareness, up to 128,000 input tokens! I grabbed the docs for pipe query syntax, saved as a PDF, and then asked Google AI Studio to produce a Markdown file of the docs. Note that Gemini Code Assist isn’t (yet) multi-modal, so I need Markdown instead of passing in a PDF or image. I then put a copy of that Markdown file in a “training” folder within my app project. I used the new @ mention feature in our Gemini Code Assist chat to specifically reference the syntax file when asking my question again.

Wow! So by giving Gemini Code Assist a reference file of pipe syntax, it was able to give me an accurate, contextual, and fresh answer.

What about updated features to existing libraries or frameworks? I mentioned the new feature of LangChain4J for the Gemini model. There’s no way I’d expect my coding assistant to know about a feature added a few days ago. Once again, I grabbed some resources. This time, I snagged the Markdown doc for Google Vertex AI Gemini from the LangChain4J repo, and converted a blog post from Guillaume to Markdown using Google AI Studio.

My prompt to the Gemini Code Assist model was “Update the service function with a call to Gemini 1.5 Flash using LangChain4J. It takes in a question about a sport, and the response is mapped to an enum with values for baseball, football, cricket, or other.” As expected, the first response was a good attempt, but it wasn’t fully accurate. And it used a manual way to map the response to an enum.

What if I pass in both of those training files with my prompt? I get back exactly the syntax I wanted for my Cloud Run Function!

So great. This approach requires me to know what tech I’m interested in up front, but still, what an improvement!

Final example. How about improved or replicated framework libraries? Let’s say I’ve got a very old Deno app that I created when I first got excited about this excellent JavaScript runtime.
```
// from https://deno.com/blog/v1.35#denoserve-is-now-stable
async function handleHttp(conn: Deno.Conn) {
  // `await` is needed here to wait for the server to handle the request
  await (async () => {
    for await (const r of Deno.serveHttp(conn)) {
      r.respondWith(new Response("Hello World from Richard"));
    }
  })();
}

for await (const conn of Deno.listen({ port: 8000 })) {
  handleHttp(conn);
}
```
This code uses some libraries and practices that are now out of date. When I modernize this app, I want to trust that I’m doing it the best way. Nothing to fear! I grabbed the Deno 1.x to 2.x migration guide, a blog post about the new approach to web servers, and the launch blog for Deno 2. The result? Impressive, including a good description of why it generated the code this way.

I could imagine putting the latest reference apps into a repo and using Gemini Code Assist’s code customization feature to pull that automatically into my app. But this demonstrated technique gives me more trust in the output of tool when freshness is paramount. What do you think?
October 14, 2024

3 ways to use AI to grade homework assignments

School is back in session, and I just met with a handful of teachers at a recent back-to-school night. They’re all figuring out how to account for generative AI tools that students have access to. I say, let’s give teachers the same tools to use. Specifically, what if a teacher wants a quick preliminary grade on book reports submitted by their students? To solve this, I used Gemini Flash 1.5 in Google Cloud Vertex AI in three different ways—one-off in the prompt editor, through code, and via declarative workflow.

Grade Homework in Vertex AI Studio

Maybe you just have one or two papers to grade. Something like Vertex AI Studio is a good choice. Even if you’re not a Google Cloud customer, you can use it for free through this link.

You don’t have any book reports handy to test this with? Me neither. In Vertex AI Studio, I prompted with something like “Write a 300 word book report for Pride and Prejudice from the perspective of an 8th grade student with a good vocabulary and strong writing skills.”

With that handy, let’s “grade” the paper. Here I’m using system instructions to direct the model. I used Gemini to generate the initial version and then I tweaked it a bit. Here’s the instructions:

Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student.

This tells the model what to look for, the expected quality (“honors English”), and the desired output.

I added the system instructions to the marked section within Vertex AI Studio.

Then I simply pasted the book report into the “prompt” section and submitted the prompt.

I thought this was a pretty good book report, but Gemini 1.5 Flash had a high bar after obeying my system instructions.

This book report shows good understanding of the book and a strong grasp of the themes. However, there are some issues:

Structure:
The introduction is a little long and could be more concise.
The conclusion is also repetitive.
The report could benefit from more clear paragraph breaks to organize the information more effectively.

Grammar:
No significant grammar issues.

Content:
The student demonstrates good understanding of the novel's major themes and characters.
There's a strong focus on plot summary, which is okay, but the report could benefit from more critical analysis. The student needs to go beyond just describing the events to analyze how they impact the themes, characters, or messages of the book.
Overall: This report shows good effort, but it needs to be refined.

Grade: B-

If I’m a teacher, this is a fascinating way to get a preliminary assessment of a writing assignment!

Grade Homework Through Code

The above solution works fine for one-off experiences, but how might you scale this AI-assisted grader? Another option is code.

To try this scenario out, I used Cloud Firestore as my document database holding the book reports. I created a collection named “Papers” in the default database and added three documents. Each one holds a different book report.

I think used the Firestore API and Vertex AI API to write some simple Go code that iterates through each Firestore document, calls Vertex AI using the provided system instructions, and then logs out the grade for each report. Note that I could have used a meta framework like LangChain, LlamaIndex, or Firebase Genkit, but I didn’t see the need.

package main

import (
	"context"
	"fmt"
	"log"
	"os"

	"cloud.google.com/go/firestore"
	"cloud.google.com/go/vertexai/genai"
	"google.golang.org/api/iterator"
)

func main() {
	// get configuration from environment variables
	projectID := os.Getenv("PROJECT_ID") 
	collectionName := os.Getenv("COLLECTION_NAME") // "Papers"
	location := os.Getenv("LOCATION")              //"us-central1"
	modelName := os.Getenv("MODEL_NAME")           // "gemini-1.5-flash-001"

	ctx := context.Background()

	//initialize Vertex AI client
	vclient, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		log.Fatalf("error creating vertex client: %v\n", err)
	}
	gemini := vclient.GenerativeModel(modelName)

	//add system instructions
	gemini.SystemInstruction = &genai.Content{
		Parts: []genai.Part{genai.Text(`Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student.
		`)},
	}

	// Initialize Firestore client
	client, err := firestore.NewClient(ctx, projectID)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}
	defer client.Close()

	// Get documents from the collection
	iter := client.Collection(collectionName).Documents(ctx)
	for {
		doc, err := iter.Next()
		if err != nil {
			if err == iterator.Done {
				break
			}
			log.Fatalf("error iterating through documents: %v\n", err)
		}

		//create the prompt
		prompt := genai.Text(doc.Data()["Contents"].(string))

		//call the model and get back the result
		resp, err := gemini.GenerateContent(ctx, prompt)
		if err != nil {
			log.Fatalf("error generating context: %v\n", err)
		}

		//print out the top candidate part in the response
		log.Println(resp.Candidates[0].Content.Parts[0])
	}

	fmt.Println("Successfully iterated through documents!")
}

The code isn’t great, but the results were. I’m also getting more verbose responses from the model, which is cool. This is a much more scalable way to quickly grade all the homework.

Grade Homework in Cloud Workflows

I like the code solution, but maybe I want to run this preliminary grading on a scheduled basis? Every Tuesday night? I could do that with my above code, but how about using a no-code workflow engine? Our Google Cloud Workflows product recently got a Vertex AI connector. Can I make it work with the same system instructions as the above two examples? Yes, yes I can.

I might be the first person to stitch all this together, but it works great. I first retrieved the documents from Firestore, looped through them, and called Vertex AI with the provided system instructions. Here’s the workflow’s YAML definition:

main:
  params: [args]
  steps:
  - init:
      assign:
        - collection: ${args.collection_name}
        - project_id: ${args.project_id}
        - location: ${args.location}
        - model: ${args.model_name}
  - list_documents:
        call: googleapis.firestore.v1.projects.databases.documents.list
        args:
            collectionId: ${collection}
            parent: ${"projects/" + project_id + "/databases/(default)/documents"}
        result: documents_list
  - process_documents:
        for:
          value: document 
          in: ${documents_list.documents}
          steps:
            - ask_llm:
                call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent
                args: 
                    model: ${"projects/" + project_id + "/locations/" + location + "/publishers/google/models/" + model}
                    region: ${location}
                    body:
                        contents:
                            role: "USER"
                            parts:
                                text: ${document.fields.Contents.stringValue}
                        systemInstruction: 
                            role: "USER"
                            parts:
                                text: "Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student."
                        generation_config:
                            temperature: 0.5
                            max_output_tokens: 2048
                            top_p: 0.8
                            top_k: 40
                result: llm_response
            - log_file_name:
                call: sys.log
                args:
                    text: ${llm_response}

No code! I executed the workflow, passing in all the runtime arguments.

In just a moment, I saw my workflow running, and “grades” being logged to the console. In real life, I’d probably update the Firestore document with this information. I’d also use Cloud Scheduler to run this on a regular basis.

While I made this post about rescuing educators from the toil of grading papers, you can apply these patterns to all sorts of scenarios. Use prompt editors like Vertex AI Studio for experimentation and finding the right prompt phrasing. Then jump into code to interact with models in a repeatable, programmatic way. And consider low-code tools when model interactions are scheduled, or part of long running processes.

September 12, 2024

More than serverless: Why Cloud Run should be your first choice for any new web app.

I’ll admit it, I’m a PaaS guy. Platform-as-a-Service is an ideal abstraction for those that don’t get joy from fiddling with infrastructure. From Google App Engine, to Heroku, to Cloud Foundry, I’ve appreciated attempts to deliver runtimes that makes it easier to ship and run code. Classic PaaS-type services were great at what they did. The problem with all of them—this includes all the first generation serverless products like Amazon Lambda—were that they were limited. Some of the necessary compromises were well-meaning and even healthy: build 12-factor apps, create loose coupling, write less code and orchestrate manage services instead. But in the end, all these platforms, while successful in various ways, were too constrained to take on a majority of apps for a majority of people. Times have changed.

Google Cloud Run started as a serverless product, but it’s more of an application platform at this point. It’s reminiscent of a PaaS, but much better. While not perfect for everything—don’t bring Windows apps, always-on background components, or giant middleware—it’s becoming my starting point for nearly every web app I build. There are ten reasons why Cloud Run isn’t limited by PaaS-t constraints, is suitable for devs at every skill level, and can run almost any web app.

It’s for functions AND apps.
You can run old AND new apps.
Use by itself AND as part of a full cloud solution.
Choose simple AND sophisticated configurations.
Create public AND private services.
Scale to zero AND scale to 1.
Do one-off deploys AND set up continuous delivery pipelines.
Own aspects of security AND offload responsibility.
Treat as post-build target AND as upfront platform choice.
Rely on built-in SLOs, logs, metrics AND use your own observability tools.

Let’s get to it.

#1. It’s for functions AND apps.

Note that Cloud Run also has “jobs” for run-to-completion batch work. I’m focusing solely on Cloud Run web services here.

I like “functions.” Write short code blocks that respond to events, and perform an isolated piece of work. There are many great uses cases for this.

The new Cloud Run functions experience makes it easy to bang out a function in minutes. It’s baked into CLI and UI. Once I decide to create a function ….

I only need to pick a service name, region, language runtime, and whether access to this function is authenticated or not.

Then, I see a browser-based editor where I can write, test, and deploy my function. Simple, and something most of us equate with “serverless.”

But there’s more. Cloud Run does apps too. That means instead of a few standalone functions to serve a rich REST endpoint, you’re deploying one Spring Boot app with all the requisite listeners. Instead of serving out a static site, you could return a full web app with server-side capabilities. You’ve got nearly endless possibilities when you can serve any container that accepts HTTP, HTTP/2, WebSockets, or gRPC traffic.

Use either abstraction, but stay above the infrastructure and ship quickly.

Docs:	Deploy container images, Deploy functions, Using gRPC, Invoke with an HTTPS request
Code labs to try:	Hello Cloud Run with Python, Getting Started with Cloud Run functions

#2. You can run old AND new apps.

This is where the power of containers shows up, and why many previous attempts at PaaS didn’t break through. It’s ok if a platform only supports new architectures and new apps. But then you’re accepting that you’ll need an additional stack for EVERYTHING ELSE.

Cloud Run is a great choice because you don’t HAVE to start fresh to use it. Deploy from source in an existing GitHub repo or from cloned code on your machine. Maybe you’ve got an existing Next.js app sitting around that you want to deploy to Cloud Run. Run a headless CMS. Does your old app require local volume mounts for NFS file shares? Easy to do. Heck, I took a silly app I built 4 1/2 years ago, deployed it from the Docker Hub, and it just worked.

Of course, Cloud Run shines when you’re building new apps. Especially when you want fast experimentation with new paradigms. With its new GPU support, Cloud Run lets you do things like serve LLMs via tools like Ollama. Or deploy generative AI apps based on LangChain or Firebase Genkit. Build powerful web apps in Go, Java, Python, .NET, and more. Cloud Run’s clean developer experience and simple workflow makes it ideal for whatever you’re building next.

Docs:	Migrate an existing web service, Optimize Java applications for Cloud Run, Supported runtime base images, Run LLM inference on Cloud Run GPUs with Ollama
Code labs to try:	How to deploy all the JavaScript frameworks to Cloud Run, Django CMS on Cloud Run, How to run LLM inference on Cloud Run GPUs with vLLM and the OpenAI Python SDK

#3. Use by itself AND as part of a full cloud solution.

There aren’t many tech products that everyone seems to like. But folks seem to really like Cloud Run, and it regularly wins over the Hacker News crowd! Some classic PaaS solutions were lifestyle choices; you had to be all in. Use the platform and its whole way of working. Powerful, but limiting.

You can choose to use Cloud Run all by itself. It’s got a generous free tier, doesn’t require complicated HTTP gateways or routers to configure, and won’t force you to use a bunch of other Google Cloud services. Call out to databases hosted elsewhere, respond to webhooks from SaaS platforms, or just serve up static sites. Use Cloud Run, and Cloud Run alone, and be happy.

And of course, you can use it along with other great cloud services. Tack on a Firestore database for a flexible storage option. Add a Memorystore caching layer. Take advantage of our global load balancer. Call models hosted in Vertex AI. If you’re using Cloud Run as part of an event-driven architecture, you might also use built-in connections to Eventarc to trigger Cloud Run services when interesting things happen in your account—think file uploaded to object storage, user role deleted, database backup completes.

Use it by itself or “with the cloud”, but either way, there’s value.

Docs:	Hosting webhooks targets, Connect to a Firestore database, Invoke services from Workflows
Code labs to try:	How to use Cloud Run functions and Gemini to summarize a text file uploaded to a Cloud Storage bucket

#4. Choose simple AND sophisticated configurations.

One reason PaaS-like services are so beloved is because they often provide a simple onramp without requiring tons of configuration. “cf push” to get an app to Cloud Foundry. Easy! Getting an app to Cloud Run is simple too. If you have a container, it’s a single command:

rseroter$ gcloud run deploy go-app --image=gcr.io/seroter-project-base/go-restapi

If all you have is source code, it’s also a single command:

rseroter$ gcloud run deploy node-app --source .

In both cases, the CLI asks me to pick a region and whether I want requests authenticated, and that’s it. Seconds later, my app is running.

This works because Cloud Run sets a series of smart, reasonable default settings.

But sometimes you do want more control over service configuration, and Cloud Run opens up dozens of possible settings. What kind of sophisticated settings do you have control over?

CPU allocation. Do you want CPU to be always on, or quit when idle?
Ingress controls. Do you want VPC-only access or public access?
Multi-container services. Add a sidecar.
Container port. The default is 8080, but set to whatever you want.
Memory. The default value is 512 MiB per instance, but you can go up to 32GB.
CPU. It defaults to 1, but you can go less than 1, or up to 8.
Healthchecks. Define startup or liveliness checks that ping specific endpoints on a schedule.
Variables and secrets. Define environment variables that get injected at runtime. Same with secrets that get mounted at runtime.
Persistent storage volumes. There’s ephemeral scratch storage in every Cloud Run instance, but you can also mount volumes from Cloud Storage buckets or NFS shares.
Request timeout. The default value is 5 minutes, but you can go up to 60 minutes.
Max concurrency. A given service instance can handle more than one request. The default value is 80, but you can go up to 1000!
and much more!

You can do something simple, you can do something sophisticated, or a bit of both.

Docs:	Configure container health checks, Maximum concurrent requests per instance, CPU allocation, Configure secrets, Deploying multiple containers to a service (sidecars)
Code labs to try:	How to use Ollama as a sidecar with Cloud Run GPUs and Open WebUI as a frontend ingress container

#5. Create public AND private services.

One of the challenge with early PaaS services was that they were just sitting on the public internet. That’s no good as you get to serious, internal-facing systems.

First off, Cloud Run services are public by default. You control the authentication level (anonymous access, or authenticated user) and need to explicitly set that. But the service itself is publicly reachable. What’s great is that this doesn’t require you to set up any weird gateways or load balancers to make it work. As soon as you deploy a service, you get a reachable address.

Awesome! Very easy. But what if you want to lock things down? This isn’t difficult either.

Cloud Run lets me specify that I’ll only accept traffic from my VPC networks. I can also choose to securely send messages to IPs within a VPC. This comes into play as well if you’re routing requests to a private on-premises network peered with a cloud VPC. We even just added support for adding Cloud Run services to a service mesh for more networking flexibility. All of this gives you a lot of control to create truly private services.

Docs:	Private networking and Cloud Run, Restrict network ingress for Cloud Run, Cloud Service Mesh
Code labs to try:	How to configure a Cloud Run service to access an internal Cloud Run service using direct VPC egress, Configure a Cloud Run service to access both an internal Cloud Run service and public Internet

#6. Scale to zero AND scale to 1.

I don’t necessarily believe that cloud is more expensive than on-premises—regardless of some well-publicized stories—but keeping idle cloud services running isn’t helping your cost posture.

Google Cloud Run truly scales to zero. If nothing is happening, nothing is running (or costing you anything). However, when you need to scale, Cloud Run scales quickly. Like, a-thousand-instances-in-seconds quickly. This is great for bursty workloads that don’t have a consistent usage pattern.

But you probably want the option to have an affordable way to keep a consistent pool of compute online to handle a steady stream of requests. No problem. Set the minimum instance to 1 (or 2, or 10) and keep instances warm. And, set concurrency high for apps that can handle it.

If you don’t have CPU always allocated, but keep a minimum instance online, we actually charge you significantly less for that “warm” instance. And you can apply committed use discounts when you know you’ll have a service running for a while.

Run bursty workloads or steadily-used workloads all in a single platform.

Docs:	About instance autoscaling in Cloud Run services, Set minimum instances, Load testing best practices
Code labs to try:	Cloud Run service with minimum instances

#7. Do one-off deploys AND set up continuous delivery pipelines.

I mentioned above that it’s easy to use a single command or single screen to get an app to Cloud Run. Go from source code or container to running app in seconds. And you don’t have to set up any other routing middleware or Cloud networking to get a routable serivce.

Sometimes you just want to do a one-off deploy without all the ceremony. Run the CLI, use the Console UI, and get on with life. Amazing.

But if that was your only option, you’d feel constrained. So you can use something like GitHub Actions to deploy to Cloud Run. Most major CI/CD products support it.

Another great option is Google Cloud Deploy. This managed service takes container artifacts and deploys them to Google Kubernetes Engine or Google Cloud Run. It offers some sophisticated controls for canary deploys, parallel deploys, post-deploy hooks, and more.

Cloud Deploy has built-in support for Cloud Run. A basic pipeline (defined in YAML, but also configured via point-and-click in the UI if you want) might show three stages for dev, test, and prod.

When the pipeline completes, we see three separate Cloud Run instances deployed, representing each stage of the pipeline.

You want something more sophisticated? Ok. Cloud Deploy supports Cloud Run canary deployments. You’d use this if you want a subset of traffic to go to the new instance before deciding to cut over fully.

This is taking advantage of Cloud Run’s built-in traffic management feature. When I check the deployed service, I see that after advancing my pipeline to 75% of production traffic for the new app version, the traffic settings are properly set in Cloud Run.

Serving traffic in multiple regions? Cloud Deploy makes it possible to ship a release to dozens of places simultaneously. Here’s a multi-target pipeline. The production stage deploys to multiple Cloud Run regions in the US.

When I checked Cloud Run, I saw instances in all the target regions. Very cool!

If you want a simple deploy, do that with the CLI or UI. Nothing stops you. However, if you’re aiming for a more robust deployment strategy, Cloud Run readily handles it through services like Cloud Deploy.

Docs:	Use a canary deployment strategy, Deploy to multiple targets at the same time, Deploying container images to Cloud Run
Code labs to try:	How to Deploy a Gemini-powered chat app on Cloud Run, How to automatically deploy your changes from GitHub to Cloud Run using Cloud Build

#8. Own aspects of security AND offload responsibility.

On reason that you choose managed compute platforms is to outsource operational tasks. It doesn’t mean you’re not capable of patching infrastructure, scaling compute nodes, or securing workloads. It means you don’t want to, and there are better uses of your time.

With Cloud Run, you can drive aspects of your security posture, and also let Cloud Run handle key aspects on your behalf.

What are you responsible for? You choose an authentication approach, including public or private services. This includes control of how you want to authenticate developers who use Cloud Run. You can authenticate end users, internal or external ones, using a handful of supported methods.

It’s also up to you to decide which service account the Cloud Service instance should impersonate. This controls what a given instance has access to. If you want to ensure that only containers with verified provenance get deployed, you can also choose to turn on Binary Authorization.

So what are you offloading to Cloud Run and Google Cloud?

You can outsource protection from DDoS and other threats by turning on Cloud Armor. The underlying infrastructure beneath Cloud Run is completely managed, so you don’t need to worry about upgrading or patching any of that. What’s also awesome is that if you deploy Cloud Run services from source, you can sign up for automatic base image updates. This means we’ll patch the OS and runtime of your containers. Importantly, it’s still up to you to patch your app dependencies. But this is still very valuable!

Docs:	Security design overview, Introduction to service identity, Use Binary Authorization. Configure automatic base image updates
Code labs to try:	How to configure a Cloud Run service to access an internal Cloud Run service using direct VPC egress, How to connect a Node.js application on Cloud Run to a Cloud SQL for PostgreSQL database

#9. Treat as post-build target AND as upfront platform choice.

You might just want a compute host for your finished app. You don’t want to have to pick that host up front, and just want a way to run your app. Fair enough! There aren’t “Cloud Run apps”; they’re just containers. That said, there are general tips that make an app more suitable for Cloud Run than not. But the key is, for modern apps, you can often choose to treat Cloud Run as a post-build decision.

Or, you can design with Cloud Run in mind. Maybe you want to trigger Cloud Run based on a specific Eventarc event. Or you want to capitalize on Cloud Run concurrency so you code accordingly. You could choose to build based on a specific integration provided by Cloud Run (e.g. Memorystore, Firestore, or Firebase Hosting).

There are times that you build with the target platform in mind. In other cases, you want a general purpose host. Cloud Run is suitable for either situation, which makes it feel unique to me.

Docs:	Optimize Java applications for Cloud Run, Integrate with Google Cloud products in Cloud Run, Trigger with events
Code labs to try:	Trigger Cloud Run with Eventarc events

#10. Rely on built-in SLOs, logs, metrics AND use your own observability tools.

If you want it to be, Cloud Run can feel like an all-in-one solution. Do everything from one place. That’s how classic PaaS was, and there was value in having a tightly-integrated experience. From within Cloud Run, you have built-in access to logs, metrics, and even setting up SLOs.

The metrics experience is powered by Cloud Monitoring. I can customize event types, the dashboards, time window, and more. This even includes the ability to set uptime checks which periodically ping your service and let you know if everything is ok.

The embedded logging experience is powered by Cloud Logging and gives you a view into all your system and custom logs.

We’ve even added an SLO capability where you can define SLIs based on availability, latency, or custom metrics. Then you set up service level objectives for service performance.

While all these integrations are terrific, you don’t have to only use this. You can feed metrics and logs into Datadog. Same with Dynatrace. You can also write out OpenTelemetry metrics or Prometheus metrics and consume those how you want.

Docs:

Monitor Health and Performance, Logging and viewing logs in Cloud Run, Using distributed tracing

Kubernetes, virtual machines, and bare metal boxes all play a key role for many workloads. But you also may want to start with the highest abstraction possible so that you can focus on apps, not infrastructure. IMHO, Google Cloud Run is the best around and satisfies the needs of most any modern web app. Give it a try!

September 9, 2024

4 ways to pay down tech debt by ruthlessly removing stuff from your architecture

What advice do you get if you’re lugging around a lot of financial debt? Many folks will tell you to start purging expenses. Stop eating out at restaurants, go down to one family car, cancel streaming subscriptions, and sell unnecessary luxuries. For some reason, I don’t see the same aggressive advice when it comes to technical debt. I hear soft language around “optimization” or “management” versus assertive stances that take a meat cleaver to your architectural excesses.

What is architectural debt? I’m thinking about bloated software portfolios where you’re carrying eight products in every category. Brittle automation that only partially works and still requires manual workarounds and black magic. Unique customizations to packaged software that’s now keeping you from being able to upgrade to modern versions. Also half-finished “ivory tower” designs where the complex distributed system isn’t fully in place, and may never be. You might have too much coupling, too little coupling, unsupported frameworks, and all sorts of things that make deployments slow, maintenance expensive, and wholesale improvements impossible.

This stuff matters. The latest StackOverflow developer survey shows that the most common frustration is the “amount of technical debt.” It’s wasting up to eight hours a week for each developer! Number two and three are around stack complexity. Your code and architectural tech debt is slowing down your release velocity, creating attrition with your best employees, and limiting how much you can invest in new tech areas. It’s well-past time to simplify by purging architecture components that have built up (and calcified) over time. Let’s write bigger checks to pay down this debt faster.

Explore these four areas, all focused on simplification. There are obviously tradeoffs and cost with each suggestion, but you’re not going to make meaningful progress by being timid. Note there are other dimensions to fixing tech debt besides simplification, but that’s one I see discussed the least often. I’ll use Google Cloud to offer some examples of how you might specifically tackle each, given we’re the best cloud for those making a firm shift away from legacy tech debt.

1. Stop moving so much data around.

If you zoom out on your architecture, how many components do you have that get data from point A to point B? I’d bet that you have lots of ETL pipelines to consolidate data into a warehouse or data lake, messaging and event processing solutions to shunt data around, and even API calls that suck data from one system into another. That’s a lot of machinery you have to create, update, and manage every day.

Can you get rid of some of this? Can you access more of the data where it rests, versus copying it all over the place? Or use software that act on data in different ways without forcing you to migrate it for further processing? I think so.

Let’s see some examples.

Perform analytical queries against data sitting in different places? Google Cloud supports that with BigQuery Omni. We run BigQuery in AWS and Azure so that you can access data at rest, and not be forced to consolidate it in a single data lake. Here, I have an Excel file sitting in an Azure blob storage account. I could copy that data over to Google Cloud, but that’s more components for me to create and manage.

Rather, I can set up a pointer to Azure from within BigQuery, and treat it like any other table. The data is processed in Azure, and only summary info travels across the wire.

You might say “that’s cool, but I have related data in another cloud, so I’d have to move it anyway to do joins and such.” You’d think so. But we also offer cross-cloud joins with BigQuery Omni. Check this out. I’ve got that employee data in Azure, but timesheet data in Google Cloud.

With a single SQL statement, I’m joining data across clouds. No data movement required. Less debt.

Enrich data in analytical queries from outside databases? You might have ETL jobs in place to bring reference data into your data warehouse to supplement what’s already there. That may be unnecessary.

With BigQuery’s Federated Queries, I can reach live into PostgreSQL, MySQL, Cloud Spanner, and even SAP Datasphere sources. Access data where it rests. Here, I’m using the EXTERNAL_QUERY function to retrieve data from a Cloud SQL database instance.

I could use that syntax to perform joins, and do all sorts of things without ever moving data around.

Perform complex SQL analytics against log data? Does your architecture have data copying jobs for operational data? Maybe to get it into a system where you can perform SQL queries against logs? There’s a better way.

Google Cloud Log Analytics lets you query, view, and analyze log data without moving it anywhere.

You can’t avoid moving data around. It’s often required. But I’m fairly sure that through smart product selection and some redesign of the architecture, you could eliminate a lot of unnecessary traffic.

2. Compress the stack by removing duplicative components.

Break out the chainsaw. Do you have multiple products for each software category? Or too many fine-grained categories full of best-of-breed technology? It’s time to trim.

My former colleague Josh McKenty used to say something along the lines of “if it’s emerging buy a few, it’s a mature, no more than two.”

You don’t need a dozen project management software products. Or more than two relational database platforms. In many cases, you can use multi-purpose services and embrace “good enough.”

There should be a fifteen day cooling off period before you buy a specialized vector database. Just use PostgreSQL. Or, any number of existing databases that now support vector capabilities. Maybe you can even skip RAG-based solutions (and infrastructure) all together for certain use cases and just use Gemini with its long context.

Do you have a half-dozen different event buses and stream processors? Maybe you don’t need all that? Composite services like Google Cloud Pub/Sub can be a publish/subscribe message broker, apply a log-like approach with a replay-able stream, and do push-based notifications.

You could use Spanner Graph instead of a dedicated graph database, or Artifact Registry as a single place for OS and application packages.

I’m keen on the new continuous queries for BigQuery where you can do stream analytics and processing as data comes into the warehouse. Enrich data, call AI models, and more. Instead of a separate service or component, it’s just part of the BigQuery engine. Turn off some stuff?

I suspect that this one is among the hardest for folks to act upon. We often hold onto technology because it’s familiar, or even because of misplaced loyalty. But be bold. Simplify your stack by getting rid of technology that’s no longer differentiated. Make a goal of having 30% fewer software products or platforms in your architecture in 2025.

3. Replace hyper-customized software and automation with managed services and vanilla infrastructure.

Hear me out. You’re not that unique. There are a handful of things that your company does which are the “secret sauce” for your success, and the rest is the same as everyone else.

More often than not, you should be fitting your team to the software, not your software to the team. I’ve personally configured and extended packaged software to a point that it was unrecognizable. For what? Because we thought our customer service intake process was SO MUCH different than anyone else’s? It wasn’t. So much tech debt happens because we want to shape technology to our existing requirements, or we want to avoid “lock-in” by committing to a vendor’s way of doing things. I think both are misguided.

I read a lot of annual reports from public companies. I’ve never seen “we slayed at Kubernetes this year” called out. Nobody cares. A cleverly scripted, hyper-customized setup that looks like the CNCF landscape diagram is more boat anchor than accelerator. Consider switching a fully automated managed cluster in something like GKE Autopilot. Pay per pod, and get automatic upgrades, secure-by-default configurations, and a host of GKE Enterprise features to create sameness across clusters.

Or thank-and-retire that customized or legacy workflow engine (code framework, or software product) that only four people actually understand. Use a nicely API-enabled managed product with useful control-flow actions, or a full-fledged cloud-hosted integration engine.

You probably don’t need a customized database, caching solution, or even CI/CD stack. These are all super mature solution spaces, where whatever is provided out of the box is likely suitable for what you really need.

4. Tone it down on the microservices and distributed systems.

Look, I get excited about technology and want to use all the latest things. But it’s often overkill, especially in the early (or late) stages of a product.

You simply don’t need a couple dozen serverless functions to serve a static web app. Simmer down. Or a big complex JavaScript framework when your site has a pair of pages. So much technical debt comes from over-engineering systems to use the latest patterns and technology, when the classic ones will do.

Smash most of your serverless functions back into an “app” hosted in Cloud Run. Fewer moving parts, and all the agility you want. Use vanilla JavaScript where you can. Use small, geo-located databases until you MUST to do cross-region or global replication. Don’t build “developer platforms” and IDPs until you actually need them.

I’m not going all DHH on you, but most folks would be better off defaulting to more monolithic systems running on a server or two. We’ve all over-distributed too many services and created unnecessarily complex architectures that are now brittle or impossible to understand. If you need the scale and resilience of distributed systems RIGHT NOW then go build one. But most of us have gotten burned from premature optimization because we assumed that our system had to handle 100x user growth overnight.

Wrap Up

Every company has tech debt, whether the business is 100 years old or started last week. Google has it, big banks have it, the governments have it, and YC companies have it. And “managing it” is probably a responsible thing to do. But sometimes, when you need to make a step-function improvement in how you work, incremental changes aren’t good enough. Simplify by removing the cruft, and take big cuts out of your architecture to do it!

August 27, 2024

Category: Google Cloud

Design and deploy app architectures with Application Design Center

Organize apps with App Hub

Observe app-centric metrics in Cloud Monitoring

Resolve issues using Cloud Assist Investigations

Manage the overall health of the application in Cloud Hub

Summary

Project setup improvements

New design and development features

Fresh testing capabilities

Better “day 2” management

Summary

Build knowledge, plans, and prototypes with Gemini

Build apps and agents with Gemini

Build AI and data systems with Gemini

Build a better day-2 experience with Gemini

Build supporting digital assets with Gemini

#1. I’ll start most efforts by asking “can AI help with this?”

#2. I’m going to do much better research.

#3. I will learn new things faster.

#4. I’ll less time debating and more time coding.

#5. I will ask better questions.

#6. I want to identify where the manual struggle is actually the point.

#7. I should create certain types of content much faster.

#8. I’m going to free up some valuable time.

Step #0 – Prerequisites and considerations

Step #1 – Create the source repo

Step #2 – Add a reference to the source repo

Step #3 – Add a Gemini Code Assist customization index

Step #4 – Use updated coding suggestions

Fine-tune your model

Use RAG

Ground the results

Train on the fly