Category: AI/ML

  • The Gemini CLI might change how I work. Here are four prompts that prove it.

    The Gemini CLI might change how I work. Here are four prompts that prove it.

    Yesterday morning, we took the wraps off one of the most interesting Google releases of 2025. The Gemini CLI is here, giving you nearly unlimited access to Gemini from directly within the terminal. This is a new space, but there are other great solutions already out there. Why is this different? Yes, it’s good at multi-step reasoning, code generation, and creative tasks. Build apps, fix code, parse images, build slides, analyze content, or whatever. But what’s truly unique is that It’s fully open source, no cost to use, usable anywhere, and super extensible. Use Gemini 2.5 Pro’s massive context window (1m tokens), multimodality, and strong reasoning ability to do some amazing stuff.

    Requirements? Have Node installed, and a Google account. That’s it. You get lots of free queries against our best models. You get more by being a cloud customer if you need it. Let’s have a quick look around, and then I’ll show you four prompts that demonstrate what it can really do.

    The slash command shows me what’s available here. I can see and resume previous chats, configure the editor environment, leverage memory via context files like GEMINI.md, change the theme, and use tools. Choosing that option shows us the available tools such as reading files and folders, finding files and folders, performing Google searches, running Shell commands, and more.

    The Gemini CLI has many extensibility points, including use of MCP servers. I added the Cloud Run MCP server but you can add anything here.

    I’m only scratching the surface here, so don’t forget to check out the official repo, docs, and blog post announcement. But now, let’s walk through four prompts that you can repeat to experience the power of the Gemini CLI, and why each is a big deal.

    Prompt #1 – Do some research.

    Software engineering is more than coding. You spend time researching, planning, and thinking. I want to build a new app, but I’m not sure which frontend framework I should use. And I don’t want stale answers from an LLM that was trained a year ago.

    I’ve got a new research report on JavaScript frameworks, and also want to factor in web results. My prompt:

    What JavaScript framework should I use to build my frontend app? I want something simple, standards-friendly, and popular. Use @report.pdf for some context, but also do a web search. Summarize the results in a way that will help me decide.

    The Gemini CLI figured out some tools to use, successfully considered the file into the prompt, started off on its work searching the web, and preparing results.

    The results were solid. I got tradeoff and analysis on three viable options. The summary was helpful and I could have continued going back and forth on clarifying questions. For architects, team leaders, and engineers, having a research partner in the terminal is powerful.

    Why was this a big deal? This prompt showed the use of live Google Search, local (binary) file processing, and in-context learning for devs. These tools are changing how I do quick research.

    Prompt #2 – Build an app.

    These tools will absolutely change how folks build, fix, change, and modernize software. Let’s build something new.

    I fed in this prompt, based on my new understanding of relevant JavaScript frameworks.

    Let’s build a calendar app for my family to plan a vacation together. It should let us vote on weeks that work best, and then nominate activities for each day. Use Vue.js for the JavaScript framework.

    Now to be sure, we didn’t build this to be excellent at one-shot results. Instead, it’s purposely built for an interactive back-and-forth with the software developer. You can start it with –yolo mode to have it automatically proceed without asking permission to do things, and even with –b to run it headless assuming no interactivity. But I want to stay in control here. So I’m not in YOLO mode.

    I quickly got back a plan, and was asked if I wanted to proceed.

    Gemini CLI also asks me about running Shell commands. I can allow it once, allow it always, or cancel. I like these options. It’s fun watching Gemini make decisions and narrate what it’s working on. Once it’s done building directories, writing code, and evaluating its results, the CLI even starts up a server so that I can test the application. The first draft was functional, but not attractive, so I asked for a cleanup.

    The next result was solid, and I could have continued iterating on new features along with look and feel.

    Why was this a big deal? This prompt showed iterative code development, important security (request permission) features, and more. We’ll also frequently offer to pop you into the IDE for further coding. This will change how I understand or bootstrap most of the code I work with.

    Prompt #3 – Do a quick deploy to the cloud.

    I’m terrible at remembering the syntax and flags for various CLI tools. The right git command or Google Cloud CLI request? Just hopeless. The Gemini CLI is my solution. I can ask for what I want, and the Gemini CLI figures out the right type of request to make.

    We added MCP as a first-class citizen, so I added the Cloud Run MCP server, as mentioned above. I also made this work without it, as the Gemini CLI figured out the right way to directly call the Google Cloud CLI (gcloud) to deploy my app. But, MCP servers provide more structure and ensure consistent implementation. Here’s the prompt I tried to get this app deployed. Vibe deployment, FTW.

    Ship this code to Cloud Run in us-west1 using my seroter-project-base project. Don’t create a Dockerfile or container, but just deploy the source files.

    The Gemini CLI immediately recognizes that a known MCP tool can help, and shows me the tool it chose.

    It got going, and shipped my code successfully to Cloud Run using the MCP server. But the app didn’t start correctly. The Gemini CLI noticed that by reading the service logs, and diagnosed the issue. We didn’t provide a reference for which port to listen on. No problem.

    It came up with a fix, made the code changes, and redeployed.

    Why was this a big deal? We saw the extensibility of MCP servers, and the ability to “forget” some details of exactly how other tools and CLIs work. Plus we observed that the Gemini CLI did some smart reasoning and resolved issues on its own. This is going to change how I deploy, and how much time I spend (waste?) deploying.

    Prompt #4 – Do responsible CI/CD to the cloud.

    The third prompt was cool and showed how you can quickly deploy to a cloud target, even without knowing the exact syntax to make it happen. I got it working with Kubernetes too. But can the Gemini CLI help me do proper CI/CD, even if I don’t know exactly how to do it? In this case I do know how to set up Google Cloud Build and Cloud Deploy, but let’s pretend I don’t. Here’s the prompt.

    Create a Cloud Build file that would build a container out of this app code and store it in Artifact Registry. Then create the necessary Cloud Deploy files that defines a dev and production environment in Cloud Run. Create the Cloud Deploy pipeline, and then reference it in the Cloud Build file so that the deploy happens when a build succeeds. And then go ahead trigger the Cloud Build. Pay very careful attention for how to create the correct files and syntax needed for targeting Cloud Run from Cloud Deploy.

    The Gemini CLI started by asking me for some info from my Google Cloud account (project name, target region) and then created YAML files for Cloud Build and Cloud Deploy. It also put together a CLI command to instantiate a Docker repo in Artifact Registry. Now, I know that the setup for Cloud Deploy working with Cloud Run has some specific syntax and formatting. Even with my above command, I can see that I didn’t get syntactically correct YAML in the skaffold file.

    I rejected the request of the Gemini CLI to do a deployment, since I knew it would fail. Then I gave it the docs URL for setting up Cloud Run with Cloud Deploy and asked it to make a correction.

    That Skaffold file doesn’t look correct. Take a look at the docs (https://cloud.google.com/deploy/docs/deploy-app-run), and follow its guidance for setting up the service YAML files, and referencing the right Skaffold version at the top. Show me the result before pushing a change to the Cloud Deploy pipeline.

    Fortunately, the Gemini CLI can do a web fetch and process the latest product documentation. I did a couple of turns and got what I wanted. Then I asked it to go ahead and update the pipeline and trigger Cloud Build.

    It failed at first because I didn’t have a Dockerfile, but after realizing that, automatically created one and started the build again.

    It took a few iterations of failed builds for the Gemini CLI to land on the right syntax. But it kept dutifully trying, making changes, and redeploy until it got it right. Just like I would have if I were doing it myself!

    After that back and forth a few times, I had all the right files, syntax, container artifacts, and pipelines going.

    Some of my experiments went faster than others, but that’s the nature of these tools, and I still did this faster overall than I would have manually.

    Why was this a big deal? This showcased some sophisticated file creation, iterative improvements, and Gemini CLI’s direct usage of the Google Cloud CLI to package, deploy, and observe running systems in a production-like way. It’ll change how confident I am doing more complex operations.

    Background agents, orchestrated agents, conversational AI. All of these will play a part in how we design, build, deploy, and operate software. What does that mean to your team, your systems, and your expectations? We’re about to find out.

  • From code to cloud: Check out six new integrations that make it easier to host your apps and models on Cloud Run

    From code to cloud: Check out six new integrations that make it easier to host your apps and models on Cloud Run

    Where you decide to run your web app is often a late-binding choice. Once you’ve finished coding something you like and done some localhost testing, you seek out a reasonable place that gives you a public IP address. Developers have no shortage of runtime host options, including hyperscalers, rented VMs from cheap regional providers, or targeted services from the likes of Firebase, Cloudflare, Vercel, Netlify, Fly.io, and a dozen others. I’m an unapologetic fanboy of Google Cloud Run—host scale-to-zero apps, functions, and jobs that offer huge resource configurations, concurrent calls, GPUs, and durable volumes with a generous free tier and straightforward pricing—and we just took the wraps of a handful of new ways to take a pile of code and turn it into a cloud endpoint.

    Vibe-code a web app in Google AI Studio and one-click deploy to Cloud Run

    Google AI Studio is really a remarkable. Build text prompts against our leading models, generate media with Gemini models, and even build apps. All at no cost. We just turned on the ability to do simple text-to-app scenarios, and added a button that deploys your app to Cloud Run.

    First, I went to the “Build” pane and added a text prompt for my new app. I wanted a motivational quote printed on top of an image of an AI generated dog.

    In one shot, I got the complete app including the correct backend AI calls to Gemini models for creating the motivational quote and generating a dog pic. So cool.

    Time to ship it. There’s rocket ship icon on the top right. Assuming you’ve connected Google AI Studio to a Google Cloud account, you’re able to pick a project and one-click deploy.

    It takes just a few seconds, and you get back the URL and a deep link to the app in Google Cloud.

    Clicking that link shows that this is a standard Cloud Run instance, with the Gemini key helpfully added as an environment variable (versus hard coded!).

    And of course, viewing the associated link takes me to my app that gives me simple motivation and happy dogs.

    That’s such a simple development loop!

    Create an .NET app in tools like Cursor and deploy it using the Cloud Run MCP server

    Let’s say you’re using one of the MANY agentic development tools that make it simpler to code with AI assistance. Lots of you like Cursor. It supports MCP as a way to reach into other systems via tools.

    We just shipped a Cloud Run MCP server, so you can make tools like Cursor aware of Cloud Run and support straightforward deployments.

    I started in Cursor and asked it to build a simple REST API and picked Gemini 2.5 Pro as my preferred model. Cursor does most (all?) of the coding work for you if you want it to.

    It went through a few iterations to land on the right code. I tested it locally to ensure the app would run.

    Cursor has native support for MCP. I added a .cursor directory to my project and dropped in a mcp.json file in there. Cursor picked up the MCP entry, validated it, and showed me the available tools.

    I asked Cursor to deploy my C# app. It explored the local folder and files to ensure it had what it needed.

    Cursor realized it had a tool that could help, and proposed the “deploy_local_folder” tool from the Cloud Run MCP server.

    After providing some requested values (location, etc), Cursor successfully deployed my .NET app.

    That was easy. And this Cloud Run MCP server will work with any of your tools that understand MCP.

    Push an open model from Google AI Studio directly to Cloud Run

    Want to deploy a model to Cloud Run? It’s the only serverless platform I know of that offers GPUs. You can use tools like Ollama to deploy any open model to Cloud Run, and I like that we made even easier for Gemma fans. To see this integration, you pick various Gemma 3 editions in Google AI Studio.

    Once you’ve done that, you’ll see a new icon that triggers a deployment directly to Cloud Run. Within minutes, you have an elastic endpoint providing inference.

    It’s not hard to deploy open models to Cloud Run. This option makes it that much easier.

    Deploy an Python agent built with the Agent Development Kit to Cloud Run with one command

    The Agent Development Kit is an open source framework and toolset that devs use to build robust AI agents. The Python version reached 1.0 yesterday, and we launched a new Java version too. Here, I started with a Python agent I built.

    Built into ADK are a few deployment options. It’s just code, so you can run it anywhere. But we’ve added shortcuts to services like Google Cloud’s Vertex AI Agent Engine and Cloud Run. Just one command puts my agent onto Cloud Run!

    We don’t yet have this CLI deployment option for the Java ADK. But it’s also simple to use the Google Cloud CLI command to deploy a Java app or agent to Cloud Run with one command too.

    Services like Cloud Run are ideal for your agents and AI apps. These built-in integrations for ADK help you get these agents online quickly.

    Use a Gradio instance in Cloud Run to experiment with prompts after one click from Vertex AI Studio

    How do you collaborate or share prompts with teammates? Maybe you’re using something like Google Cloud Vertex AI to iterate on a prompt yourself. Here, I wrote system instructions and a prompt for helping me prioritize my work items.

    Now, I can click “deploy an app” and get a Gradio instance for experimenting further with my app.

    This has public access by default, so I’ve got to give the ok.

    After a few moments, I have a running Cloud Run app! I’m shown this directly from Vertex AI and have a link to open the app.

    That link brings me to this Gradio instance that I can share with teammates.

    The scalable and accessible Cloud Run is ideal for spontaneous exploration of things like AI prompts. I like this integration!

    Ship your backend Java code to Cloud Run directly from Firebase Studio

    Our final example looks at Firebase Studio. Have you tried this yet? It’s a free to use, full-stack dev environment in the cloud for nearly any type of app. And it supports text-to-app scenarios if you don’t want to do much coding yourself. There are dozens of templates, including one for Java.

    I spun up a Java dev environment to build a web service.

    This IDE will look familiar. Bring in your favorite extensions, and we’ve also pre-loaded this with Gemini assistance, local testing tools, and more. See here that I used Gemini to add a new REST endpoint to my Java API.

    Here on the left is an option to deploy to Cloud Run!

    After authenticating to my cloud account and picking my cloud project, I could deploy. After a few moments, I had another running app in Cloud Run, and had a route to make continuous updates.

    Wow. That’s a lot of ways to go from code to cloud. Cloud Run is terrific for frontend or backend components, functions or apps, open source or commercial products. Try one of these integrations and tell me what you think!

  • What does a modern, AI-assisted developer workflow built around Google Gemini look like? Let’s explore.

    What does a modern, AI-assisted developer workflow built around Google Gemini look like? Let’s explore.

    Software is never going to be the same. Why would we go back to laborious research efforts, wasting time writing boilerplate code, and accepting so many interruptions to our flow state? Hard pass. It might not happen for you tomorrow, next month, or next year, but AI will absolutely improve your developer workflow.

    Your AI-powered workflow may make use of more than one LLMs. Go for it. But we’ve done a good job of putting Gemini into nearly every stage of the new way of working. Let’s look at what you can do RIGHT NOW to build with Gemini.

    Build knowledge, plans, and prototypes with Gemini

    Are you still starting your learning efforts with a Google search? Amateurs 🙂 I mean, keep doing those so that we earn ad dollars. But you’ve got so many new ways to augment a basic search.

    Gemini Deep Research is pretty amazing. Part of Gemini Advanced, it takes your query, searches the web on your behalf, and gives you a summary in minutes. Here I asked for help understanding the landscape of PostgreSQL providers, and it recapped results found in 240+ relevant websites from vendors, Reddit, analyst, and more.

    Gemini Deep Research creating a report about the PostgreSQL landscape

    You’ve probably heard of NotebookLM. Built with Gemini 2.0, it takes all sorts of digital content and helps you make sense of it. Including those hyper-realistic podcasts (“Audio Overviews”).

    Planning your work or starting to flesh out a prototype? For free, Google AI Studio lets you interact with the latest Gemini models. Generate text, audio, or images from prompts. Produce complex codebases based on reference images or text prompts. Share your desktop and get live assistance on whatever task you’re doing. It’s pretty rad.

    Google AI Studio’s Live API makes it possible to interact live with the model

    Google Cloud customers can get knowledge from Gemini in a few ways. The chat for Gemini Cloud Assist gives me an ever-present agent that can help answer questions or help me explore options. Here, I asked for a summary of the options for running PostgreSQL in Google Cloud. It breaks the response down by fully-managed, self-managed, and options for migration.

    Chat for Gemini Code Assist teaches me about PostgreSQL options

    Gemini for Google Cloud blends AI-assistance into many different services. One way to use this is to understand existing SQL scripts, workflows, APIs, and more.

    Gemini in BigQuery explains an existing query and helps me learn about it

    Trying to plan out your next bit of work? Google AI Studio or Vertex AI Studio can assist here too. In either service, you can pass in your backlog of features and bugs, maybe an architecture diagram or two, and even some reference PDFs, and ask for help planning out the next sprint. Pretty good!

    Vertex AI Studio “thinking” through a sprint plan based on multi-modal input

    Build apps and agents with Gemini

    We can use Google AI Studio or Vertex AI Studio to learn things and craft plans, but now let’s look at how you’d actually build apps with Gemini.

    You can work with the raw Gemini API. There are SDK libraries for Python, Node, Go, Dart, Swift, and Android. If you’re working with Gemini 2.0 and beyond, there’s a new unified SDK that works with both the Developer API and Enterprise API (Vertex). It’s fairly easy to use. I wrote a Google Cloud Function that uses the unified Gemini API to generate dinner recipes for whatever ingredients you pass in.

    package function
    
    import (
    	"context"
    	"encoding/json"
    	"fmt"
    	"log"
    	"net/http"
    	"os"
    
    	"github.com/GoogleCloudPlatform/functions-framework-go/functions"
    	"google.golang.org/genai"
    )
    
    func init() {
    	functions.HTTP("GenerateRecipe", generateRecipe)
    }
    
    func generateRecipe(w http.ResponseWriter, r *http.Request) {
    	ctx := context.Background()
    	ingredients := r.URL.Query().Get("ingredients")
    
    	if ingredients == "" {
    		http.Error(w, "Please provide ingredients in the query string, like this: ?ingredients=pork, cheese, tortilla", http.StatusBadRequest)
    		return
    	}
    
    	projectID := os.Getenv("PROJECT_ID")
    	if projectID == "" {
    		projectID = "default" // Provide a default, but encourage configuration
    	}
    
    	location := os.Getenv("LOCATION")
    	if location == "" {
    		location = "us-central1" // Provide a default, but encourage configuration
    	}
    
    	client, err := genai.NewClient(ctx, &genai.ClientConfig{
    		Project:  projectID,
    		Location: location,
    		Backend:  genai.BackendVertexAI,
    	})
    	//add error check for err
    	if err != nil {
    		log.Printf("error creating client: %v", err)
    		http.Error(w, "Failed to create Gemini client", http.StatusInternalServerError)
    		return
    	}
    
    	prompt := fmt.Sprintf("Given these ingredients: %s, generate a recipe.", ingredients)
    	result, err := client.Models.GenerateContent(ctx, "gemini-2.0-flash-exp", genai.Text(prompt), nil)
    	if err != nil {
    		log.Printf("error generating content: %v", err)
    		http.Error(w, "Failed to generate recipe", http.StatusServiceUnavailable)
    		return
    	}
    
    	if len(result.Candidates) == 0 {
    		http.Error(w, "No recipes found", http.StatusNotFound) // Or another appropriate status
    		return
    	}
    
    	recipe := result.Candidates[0].Content.Parts[0].Text // Extract the generated recipe text
    
    	response, err := json.Marshal(map[string]string{"recipe": recipe})
    	if err != nil {
    		log.Printf("error marshalling response: %v", err)
    		http.Error(w, "Failed to format response", http.StatusInternalServerError)
    		return
    	}
    
    	w.Header().Set("Content-Type", "application/json")
    	w.Write(response)
    }
    

    There are a lot agent frameworks out there right now. A LOT. Many of them have good Gemini support. You can build agents with Gemini using LangChain, LangChain4J, LlamaIndex, Spring AI, Firebase Genkit, and the Vercel AI SDK.

    What coding tools can I use with Gemini? GitHub Copilot now supports Gemini models. Folks who love Cursor can choose Gemini as their underlying model. Same goes for fans of Sourcegraph Cody. Gemini Code Assist from Google Cloud puts AI-assisted tools into Visual Studio Code and the JetBrains IDEs. Get the power of Gemini’s long context, personalization on your own codebase, and now the use of tools to pull data from Atlassian, GitHub, and more. Use Gemini Code Assist within your local IDE, or in hosted environments like Cloud Workstations or Cloud Shell Editor.

    Gemini Code Assist brings AI assistance to your dev workspace, including the use of tools

    Project IDX is another Google-provided dev experience for building with Gemini. Use it for free, and build AI apps, with AI tools. It’s pretty great for frontend or backend apps.

    Project IDX lets you build AI apps with AI tools

    Maybe you’re building apps and agents with Gemini through low-code or declarative tools? There’s the Vertex AI Agent Builder. This Google Cloud services makes it fairly simple to create search agents, conversational agents, recommendation agents, and more. No coding needed!

    Conversational agents in the Vertex AI Agent Builder

    Another options for building with Gemini is the declarative Cloud Workflows service. I built a workflow that calls Gemini through Vertex AI and summarizes any provided document.

    # Summarize a doc with Gemini
    main:
      params: [args]
      steps:
      - init:
          assign:
            - doc_url: ${args.doc_url}
            - project_id: ${args.project_id}
            - location: ${args.location}
            - model: ${args.model_name}
            - desired_tone: ${args.desired_tone}
            - instructions: 
      - set_instructions:
          switch:
            - condition: ${desired_tone == ""}
              assign:
                - instructions: "Deliver a professional summary with simple language."
              next: call_gemini
            - condition: ${desired_tone == "terse"}
              assign:
                - instructions: "Deliver a short professional summary with the fewest words necessary."
              next: call_gemini
            - condition: ${desired_tone == "excited"}
              assign:
                - instructions: "Deliver a complete, enthusiastic summary of the document."
              next: call_gemini
      - call_gemini:
          call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent
          args:
            model: ${"projects/" + project_id + "/locations/" + location + "/publishers/google/models/" + model}
            region: ${location}
            body:
              contents:
                role: user
                parts:
                  - text: "summarize this document"
                  - fileData: 
                      fileUri: ${doc_url}
                      mimeType: "application/pdf"
              systemInstruction: 
                role: user
                parts:
                  - text: ${instructions}
              generation_config:  # optional
                temperature: 0.2
                maxOutputTokens: 2000
                topK: 10
                topP: 0.9
          result: gemini_response
      - returnStep:
          return: ${gemini_response.candidates[0].content.parts[0].text}
    

    Similarly, its sophisticated big-brother, Application Integration, can also interact with Gemini through drag-and-drop integration workflows. These sorts of workflow tools help you bake Gemini predictions into all sorts of existing processes.

    Google Cloud Application Integration calls Gemini models

    After you build apps and agents, you need a place to host them! In Google Cloud, you’ve could run in a virtual machine (GCE), Kubernetes cluster (GKE), or serverless runtime (Cloud Run). There’s also the powerful Firebase App Hosting for these AI apps.

    There are also two other services to consider. For RAG apps, we now offer the Vertex AI RAG Engine. I like this because you get a fully managed experience for ingesting docs, storing in a vector database, and performing retrieval. Doing LangChain? LangChain on Vertex AI offers a handy managed environment for running agents and calling tools.

    Build AI and data systems with Gemini

    In addition to building straight-up agents or apps, you might build backend data or AI systems with Gemini.

    If you’re doing streaming analytics or real-time ETL with Dataflow, you can build ML pipelines, generate embeddings, and even invoke Gemini endpoints for inference. Maybe you’re doing data analytics with frameworks like Apache Spark, Hadoop, or Apache Flink. Dataproc is a great service that you can use within Vertex AI, or to run all sorts of data workflows. I’m fairly sure you know what Colab is, as millions of folks per month use it for building notebooks. Colab and Colab Enterprise offer two great ways to build data solutions with Gemini.

    Let’s talk about building with Gemini inside your database. From Google Cloud SQL, Cloud Spanner, and AlloyDB, you can create “remote models” that let you interact with Gemini from within your SQL queries. Very cool and useful. BigQuery also makes it possible to work directly with Gemini from my SQL query. Let me show you.

    I made a dataset from the public “release notes” dataset from Google Cloud. Then I made a reference to the Gemini 2.0 Flash model, and then asked Gemini for a summary of all a product’s release notes from the past month.

    -- create the remote model
    CREATE OR REPLACE MODEL
    `[project].public_dataset.gemini_2_flash`
    REMOTE WITH CONNECTION `projects/[project]/locations/us/connections/gemini-connection`
    OPTIONS (ENDPOINT = 'gemini-2.0-flash-exp');
    
    -- query an aggregation of responses to get a monthly product summary
    SELECT * 
    FROM
     ML.GENERATE_TEXT(
        MODEL `[project].public_dataset.gemini_2_flash`,
        (
          SELECT CONCAT('Summarize this month of product announcements by rolling up the key info', monthly_summary) AS prompt
          FROM (
            SELECT STRING_AGG(description, '; ') AS monthly_summary
            FROM `bigquery-public-data`.`google_cloud_release_notes`.`release_notes` 
            WHERE product_name = 'AlloyDB' AND DATE(published_at) BETWEEN '2024-12-01' AND '2024-12-31'
          )
        ),
        STRUCT(
          .05 AS TEMPERATURE,
          TRUE AS flatten_json_output)
        )
    

    How wild is that? Love it.

    You can also build with Gemini in Looker. Build reports, visualizations, and use natural language to explore data. See here for more.

    And of course, Vertex AI helps you build with Gemini. Build prompts, fine-tune models, manage experiments, make batch predictions, and lots more. If you’re working with AI models like Gemini, you should give Vertex AI a look.

    Build a better day-2 experience with Gemini

    It’s not just about building software with Gemini. The AI-driven product workflow extends to post-release activities.

    Have to set up least-privilege permissions for service accounts? Build the right permission profile with Gemini.

    The “Help me choose roles” feature uses Gemini to figure out the right permissions

    Something goes wrong. You need to get back to good. You can build faster resolution plans with Gemini. Google Cloud Logging supports log summarization with Gemini.

    Google Cloud Logging supports log summarization with Gemini

    Ideally, you know when something goes wrong before your customers notice. Synthetic monitors are one way to solve that. We made it easy to build synthetic monitors with Gemini using natural language.

    “Help me code” option for creating synthetic monitors in Cloud Monitoring

    You don’t want to face security issues on day-2, but it happens. Gemini is part of Security Command Center where you can build search queries and summarize cases.

    Gemini can also help you build billing reports. I like this experience where I can use natural language to get answers about my spend in Cloud Billing.

    Gemini in Cloud Billing makes it easier to understand your spend

    Build supporting digital assets with Gemini

    The developer workflow isn’t just about code artifacts. Sometimes you create supporting assets for design docs, production runbooks, team presentations, and more.

    Use the Gemini app (or our other AI surfaces) to generate images. I do this all the time now!

    Image for use in a presentation is generated by Gemini

    Building slides? Writing docs? Creating spreadsheets? Gemini for Workspace gives you some help here. I use this on occasion to refine text, generate slides or images, and update tables.

    Gemini in Google Docs helps me write documents

    Maybe you’re getting bored with static image representations and want some more videos in your life? Veo 2 is frankly remarkable and might be a new tool for your presentation toolbox. Consider a case where you’re building a mobile app that helps people share cars. Maybe produce a quick video to embed in the design pitch.

    Veo 2 generating videos for use in a developer’s design pitch

    AI disrupts the traditional product development workflow. Good! Gemini is part of each stage of the new workflow, and it’s only going to get better. Consider introducing one or many of these experiences to your own way of working in 2025.

  • 8 ways AI will change how I work in 2025

    You don’t have to use generative AI. It’s possible to avoid it and continue doing whatever you’ve been doing, the way you’ve been doing it. I don’t believe that sentence will be true in twelve months. Not because you’ll have to use it—although in some cases it may be unavoidable—but because you’ll want to use it. I thought about how my work will change next year.

    #1. I’ll start most efforts by asking “can AI help with this?”

    Do I need to understand a new market or product area? Analyze a pile of data? Schedule a complex series of meetings? Quickly generate a sample app for a customer demo? Review a blog post a teammate wrote? In most cases, AI can give me an assist. I want to change my mental model to first figure out if there’s a smarter (Ai-assisted) way to do something.

    That said, it’s about “can AI help me” versus “can AI do all my work.” I don’t want to end up in this situation.

    #2. I’m going to do much better research.

    Whether planning a strategy or a vacation, there’s a lot of time spent researching. That’s ok, as you often uncover intriguing new tangents while exploring the internet.

    AI can still improve the process. A lot. I find myself using the Gemini app, Google AI Studio, and NotebookLM to understand complex ideas. Gemini Deep Research is almost unbelievable. Give it a prompt, it scours the web for dozens or hundreds of sources, and then compiles a report.

    What an amazing way to start or validate research efforts. Have an existing pile of content—might be annual reports, whitepapers, design docs, or academic material—that you need to make sense of? NotebookLM is pretty amazing, and should change how all of us ask questions of research material.

    #3. I will learn new things faster.

    Many of us have jobs where we need to quickly get up to speed on a topic. I want help in context, so that I stay in a flowstate.

    Back to NotebookLM, I might use this to get easier-to-digest audio overviews of complex new ideas.

    And then with coding assistance tools, I also am getting more and more comfortable staying in my IDE to get help on things I don’t yet know. Here, my Gemini Code Assist extension is helping me learn how to fix my poorly-secured Java code.

    Finally, I’m quite intrigued by how the new Gemini 2.0 Multimodal Live API will help me in the moment. By sharing my screen with the model, I can get realtime help into whatever I’m struggling with. Wow.

    #4. I’ll less time debating and more time coding.

    My day job is to lead a sizable team at Google Cloud and help everyone do their best work. I still like to code, though!

    it’s already happening, but next year I expect to code more than in years past. Why? Because AI is making easier and more fun. Whether using an IDE assistant, or a completely different type of IDE like Cursor, it’s never been simpler to build legit software. We all can go from idea to reality so quickly now.

    Stop endlessly debating ideas, and just test them out quickly! Using lowcode platforms or AI assisted coding tools, you can get working prototypes in no time.

    #5. I will ask better questions.

    I’ve slowly learned that the best leaders simply ask better questions. AI can help us a few ways here. First, there are “thinking” models that show you a chain of thought that might inspire your own questions.

    LLMs are awesome at giving answers, but they’re also pretty great at crafting questions. Look at this. I uploaded a set of (fake) product bugs and asked the Gemini model to help me come up with clarifying questions to ask the engineers. Good list!

    And how about this. Google Cloud BigQuery has an excellent feature called Data Insights which generates a bunch of candidate questions for a given dataset (here, the Google Cloud Release Notes). What a great way to get some smart, starter questions to consider!

    #6. I want to identify where the manual struggle is actually the point.

    I don’t want AI to do everything for me. There are cases where the human struggle is where the enjoyment comes from. Learning how to do something. Fumbling with techniques. Building up knowledge or strength. I don’t want a shortcut. I want deep learning.

    I’m going to keep doing my daily reading list by hand. No automation allowed, as it forces me to really get a deeper grasp on what’s going on in our industry. I’m not using AI to write newsletters, as I want to keep working on the writing craft myself.

    This mass integration of AI into services and experiences is great. It also forces us to stop and decide where we intentionally want to avoid it!

    #7. I should create certain types of content much faster.

    There’s no excuse to labor over document templates or images in presentations anymore. No more scouring the web for the perfect picture.

    I use Gemini in Google Slides all the time now. This is the way I add visuals to presentations and it saves me hours of time.

    Generate code, docs, and images, sure. We’ve seen that, but the image generation tech is getting tremendous.

    But videos too? I’m only starting to consider how to use remarkable technology like Veo 2. I’m using it now, and it’s blowing my mind. It’ll likely impact what I produce next year.

    #8. I’m going to free up some valuable time.

    That’s what most of this is all about. I don’t want to do less work; I want to do better work. Even with all this AI and automation, I expect I’ll be working the same number of hours next year. But I’ll be happier with how I’m spending those hours: learning, talking to humans, investing in others. Less time writing boilerplate code, breaking flow state to get answers, or even executing mindlessly repetitive tasks in the browser.

    I don’t work for AI; AI works for me. And in 2025, I’m expecting to make it work hard!

  • Customizing AI coding suggestions using the *best* code, not just *my* code

    Customizing AI coding suggestions using the *best* code, not just *my* code

    The ability to use your own codebase to customize the suggestions from an AI coding assist is a big deal. This feature—available in products like Gemini Code Assist, GitHub Copilot, and Tabnine—gives developers coding standards, data objects, error messages, and method signatures that they recognize from previous projects. Data shows that the acceptance rate for AI coding assistants goes way up when devs get back trusted results that look familiar. But I don’t just want up-to-date and familiar code that *I* wrote. How can I make sure my AI coding assistant gives me the freshest and best code possible? I used code customization in Gemini Code Assist to reference Google Cloud’s official code sample repos and now I get AI suggestions that feature the latest Cloud service updates and best practices for my preferred programming languages. Let me show you how I did it.

    Last month, I showed how to use local codebase awareness in Gemini Code Assist (along with its 128,000 input token window) to “train” the model on the fly using code samples or docs that an LLM hasn’t been trained on yet. It’s a cool pattern, but also requires upfront understanding of what problem you want to solve, and work to stash examples into your code repo. Can I skip both steps?

    Yes, Gemini Code Assist Enterprise is now available and I can point to existing code repos in GitHub or GitLab. When I reference a code repo, Google Cloud automatically crawls it, chunks it up, and stores it (encrypted) in a vector database within a dedicated project in my Google Cloud environment. Then, the Gemini Code Assist plugin uses that data as part of a RAG pattern when I ask for coding suggestions. By pointing at Google Cloud’s code sample repos—any best practice repo would apply here—I supercharge my recommendations with data the base LLM doesn’t have (or prioritize).

    Step #0 – Prerequisites and considerations

    Code customization is an “enterprise” feature of Gemini Code Assist, so it requires a subscription to that tier of service. There’s a promotional $19-per-month price until March of 2025, so tell your boss to get moving.

    Also, this is currently available in US, European, and Asian regions, you may need to request geature access via a form (depending on when you read this), and today it works with GitHub.com and GitLab.com repos, although on-premises indexing is forthcoming. Good? Good. Let’s keep going.

    Step #1 – Create the source repo

    One wrinkle here is that you need to own the repos you ask Gemini Code Assist to index. You can’t just point at any random repo to index. Deal breaker? Nope.

    I can just fork an existing repo into my own account! For example, here’s the Go samples repo from Google Cloud, and the Java one. Each one is stuffed with hundreds of coding examples for interacting with most of Google Cloud’s services. These repos are updated multiple times per week to ensure they include support for all the latest Cloud service features.

    I went ahead and forked each repo in GitHub. You can do it via the CLI or in the web console.

    I didn’t overthink it and kept the repository name the same.

    Gemini Code Assist can index up to 950 repos (and more if really needed), so you could liberally refer to best-practice repos that will help your developers write better code.

    Any time I want to refresh my fork to grab the latest code sample updates, I can do so.

    Step #2 – Add a reference to the source repo

    Now I needed to reference these repos for later code customization. Google Cloud Developer Connect is a service that maintains connections to source code sitting outside Google Cloud.

    I started by choosing GitHub.com as my source code environment.

    Then I named my Developer Connect connection.

    Then I installed a GitHub app into my GitHub account. This app is what enables the loading of source data into the customization service. From here, I chose the specific repos that I wanted available to Developer Connect.

    When finished, I had one of my own repos, and two best practice repos all added to Developer Connect.

    That’s it! Now to point these linked repos to Gemini Code Assist.

    Step #3 – Add a Gemini Code Assist customization index

    I had just two CLI commands to execute.

    First, I created a code customization index. You’ve got one index per Cloud project (although you can request more) and you create it with one command.

    Next, I created a repository group for the index. You use these to control access to repos, and could have different ones for different dev audiences. Here’s where you actually point to a given repo that has the Developer Connect app installed.

    I ran this command a few times to ensure that each of my three repos was added to the repository group (and index).

    Indexing can take up to 24 hours, so here’s where you wait. After a day, I saw that all my target repos were successfully indexed.

    Whenever I sync the fork with the latest updates to code samples, Gemini Code Assist will index the updated code automatically. And my IDE with Gemini Code Assist will have the freshest suggestions from our samples repo!

    Step #4 – Use updated coding suggestions

    Let’s prove that this worked.

    I looked for a recent commit to the Go samples repos that the base Gemini Code Assist LLM wouldn’t know about yet. Here’s one that has new topic-creation parameters for our Managed Kafka service. I gave the prompt below to Gemini Code Assist. First, I used a project and account that was NOT tied to the code customization index.

    //function to create a topic in Google Cloud Managed Kafka and include parameters for setting replicationfactor and partitioncount
    

    The coding suggestion was good, but incomplete as it was missing the extra configs the service can now accept.

    When I went to my Code Assist environment that did have code customization turned on, you see that the same prompt gave me a result that mirrored the latest Go sample code.

    I tried a handful of Java and Go prompts, and I regularly (admittedly, not always) got back exactly what I wanted. Good prompt engineering might have helped me reach 100%, but I still appreciated the big increase in quality results. It was amazing to have hundreds of up-to-date Google-tested code samples to enrich my AI-provided suggestions!

    AI coding assistants that offer code customization from your own repos are a difference maker. But don’t stop at your own code. Index other great code repos that represent the coding standards and fresh content your developers need!

  • I love this technique for getting up-to-date suggestions from my AI coding assistant

    I love this technique for getting up-to-date suggestions from my AI coding assistant

    Trust. Without trust, AI coding assistants won’t become a default tool in a developer’s toolbox. Trust is the #1 concern of devs today, and it’s something I’ve struggled with in regards to getting the most relevant answers from an LLM. Specifically, am I getting back the latest information? Probably not, given that LLMs have a training cutoff date. Your AI coding assistant probably doesn’t (yet) know about Python 3.13, the most recent features of your favorite cloud service, or the newest architectural idea shared at a conference last week. What can you do about that?

    To me, this challenge comes up in at least three circumstances. There are entirely new concepts or tools that the LLM training wouldn’t know about. Think something like pipe syntax as an alternative to SQL syntax. I wouldn’t expect a model trained last year to know about that. How about updated features to existing libraries or frameworks? I want suggestions that reflect the full feature set of the current technology and I don’t want to accidentally do something the hard (old) way. An example? Consider the new “enum type” structured output I can get from LangChain4J. I’d want to use that now! And finally, I think about improved or replicated framework libraries. If I’m upgrading from Java 8 to Java 23, or Deno 1 to Deno 2, I want to ensure I’m not using deprecated features. My AI tools probably don’t know about any of these.

    I see four options for trusting the freshness of responses from your AI assistant. The final technique was brand new to me, and I think it’s excellent.

    1. Fine-tune your model
    2. Use retrieval augmented generation (RAG)
    3. Ground the results with trusted sources
    4. “Train” on the fly with input context

    Let’s briefly look at the first three, and see some detailed examples of the fourth.

    Fine-tune your model

    Whether using commercial or open models, they all represent a point-in-time based on their training period. You could choose to repeatedly train your preferred model with fresh info about the programming languages, frameworks, services, and patterns you care about.

    The upside? You can get a model with knowledge about whatever you need to trust it. The downside? It’s a lot of work—you’d need to craft a healthy number of examples and must regularly tune the model. That could be expensive, and the result wouldn’t naturally plug into most AI coding assistance tools. You’d have to jump out of your preferred coding tool to ask questions of a model elsewhere.

    Use RAG

    Instead of tuning a serving a custom model, you could choose to augment the input with pre-processed content. You’ll get back better, more contextual results when taking into account data that reflects the ideal state.

    The upside? You’ll find this pattern increasingly supported in commercial AI assistants. This keeps you in your flow without having to jump out to another interface. GitHub Copilot offers this, and now our Gemini Code Assist provides code customization based on repos in GitHub or GitLab. With Code Assist, we handle the creation and management of the code index of your repos, and you don’t have to manually chunk and store your code. The downside? This only works well if you’ve got the most up-to-date data in an indexed source repo. If you’ve got old code or patterns in there, that won’t help your freshness problem. And while these solutions are good for extra code context, they may not support a wider range of possible context sources (e.g. text files).

    Ground the results

    This approach gives you more confidence that the results are accurate. For example, Google Cloud’s Vertex AI offers “ground with Google Search” so that responses are matched to real, live Google Search results.

    If I ask a question about upgrading an old bit of Deno code, you can see that the results are now annotated with reference points. This gives me confidence to some extent, but doesn’t necessarily guarantee that I’m getting the freshest answers. Also, this is outside of my preferred tool, so it again takes me out of a flow state.

    Train on the fly

    Here’s the approach I just learned about from my boss’s boss, Keith Ballinger. I complained about freshness of results from AI assistance tools, and he said “why don’t you just train it on the fly?” Specifically, pass the latest and greatest reference data into a request within the AI assistance tool. Mind … blown.

    How might it handle entirely new concepts or tools? Let’s use that pipe syntax example. In my code, I want to use this fresh syntax instead of classic SQL. But there’s no way my Gemini Code Assist environment knows about that (yet). Sure enough, I just get back a regular SQL statement.

    But now, Gemini Code Assist supports local codebase awareness, up to 128,000 input tokens! I grabbed the docs for pipe query syntax, saved as a PDF, and then asked Google AI Studio to produce a Markdown file of the docs. Note that Gemini Code Assist isn’t (yet) multi-modal, so I need Markdown instead of passing in a PDF or image. I then put a copy of that Markdown file in a “training” folder within my app project. I used the new @ mention feature in our Gemini Code Assist chat to specifically reference the syntax file when asking my question again.

    Wow! So by giving Gemini Code Assist a reference file of pipe syntax, it was able to give me an accurate, contextual, and fresh answer.

    What about updated features to existing libraries or frameworks? I mentioned the new feature of LangChain4J for the Gemini model. There’s no way I’d expect my coding assistant to know about a feature added a few days ago. Once again, I grabbed some resources. This time, I snagged the Markdown doc for Google Vertex AI Gemini from the LangChain4J repo, and converted a blog post from Guillaume to Markdown using Google AI Studio.

    My prompt to the Gemini Code Assist model was “Update the service function with a call to Gemini 1.5 Flash using LangChain4J. It takes in a question about a sport, and the response is mapped to an enum with values for baseball, football, cricket, or other.” As expected, the first response was a good attempt, but it wasn’t fully accurate. And it used a manual way to map the response to an enum.

    What if I pass in both of those training files with my prompt? I get back exactly the syntax I wanted for my Cloud Run Function!

    So great. This approach requires me to know what tech I’m interested in up front, but still, what an improvement!

    Final example. How about improved or replicated framework libraries? Let’s say I’ve got a very old Deno app that I created when I first got excited about this excellent JavaScript runtime.

    // from https://deno.com/blog/v1.35#denoserve-is-now-stable
    async function handleHttp(conn: Deno.Conn) {
      // `await` is needed here to wait for the server to handle the request
      await (async () => {
        for await (const r of Deno.serveHttp(conn)) {
          r.respondWith(new Response("Hello World from Richard"));
        }
      })();
    }
    
    for await (const conn of Deno.listen({ port: 8000 })) {
      handleHttp(conn);
    }
    

    This code uses some libraries and practices that are now out of date. When I modernize this app, I want to trust that I’m doing it the best way. Nothing to fear! I grabbed the Deno 1.x to 2.x migration guide, a blog post about the new approach to web servers, and the launch blog for Deno 2. The result? Impressive, including a good description of why it generated the code this way.

    I could imagine putting the latest reference apps into a repo and using Gemini Code Assist’s code customization feature to pull that automatically into my app. But this demonstrated technique gives me more trust in the output of tool when freshness is paramount. What do you think?

  • 3 ways to use AI to grade homework assignments

    3 ways to use AI to grade homework assignments

    School is back in session, and I just met with a handful of teachers at a recent back-to-school night. They’re all figuring out how to account for generative AI tools that students have access to. I say, let’s give teachers the same tools to use. Specifically, what if a teacher wants a quick preliminary grade on book reports submitted by their students? To solve this, I used Gemini Flash 1.5 in Google Cloud Vertex AI in three different ways—one-off in the prompt editor, through code, and via declarative workflow.

    Grade Homework in Vertex AI Studio

    Maybe you just have one or two papers to grade. Something like Vertex AI Studio is a good choice. Even if you’re not a Google Cloud customer, you can use it for free through this link.

    You don’t have any book reports handy to test this with? Me neither. In Vertex AI Studio, I prompted with something like “Write a 300 word book report for Pride and Prejudice from the perspective of an 8th grade student with a good vocabulary and strong writing skills.”

    With that handy, let’s “grade” the paper. Here I’m using system instructions to direct the model. I used Gemini to generate the initial version and then I tweaked it a bit. Here’s the instructions:

    Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student.

    This tells the model what to look for, the expected quality (“honors English”), and the desired output.

    I added the system instructions to the marked section within Vertex AI Studio.

    Then I simply pasted the book report into the “prompt” section and submitted the prompt.

    I thought this was a pretty good book report, but Gemini 1.5 Flash had a high bar after obeying my system instructions.

    This book report shows good understanding of the book and a strong grasp of the themes. However, there are some issues:

    Structure:
    The introduction is a little long and could be more concise.
    The conclusion is also repetitive.
    The report could benefit from more clear paragraph breaks to organize the information more effectively.

    Grammar:
    No significant grammar issues.

    Content:
    The student demonstrates good understanding of the novel's major themes and characters.
    There's a strong focus on plot summary, which is okay, but the report could benefit from more critical analysis. The student needs to go beyond just describing the events to analyze how they impact the themes, characters, or messages of the book.
    Overall: This report shows good effort, but it needs to be refined.

    Grade: B-

    If I’m a teacher, this is a fascinating way to get a preliminary assessment of a writing assignment!

    Grade Homework Through Code

    The above solution works fine for one-off experiences, but how might you scale this AI-assisted grader? Another option is code.

    To try this scenario out, I used Cloud Firestore as my document database holding the book reports. I created a collection named “Papers” in the default database and added three documents. Each one holds a different book report.

    I think used the Firestore API and Vertex AI API to write some simple Go code that iterates through each Firestore document, calls Vertex AI using the provided system instructions, and then logs out the grade for each report. Note that I could have used a meta framework like LangChain, LlamaIndex, or Firebase Genkit, but I didn’t see the need.

    package main
    
    import (
    	"context"
    	"fmt"
    	"log"
    	"os"
    
    	"cloud.google.com/go/firestore"
    	"cloud.google.com/go/vertexai/genai"
    	"google.golang.org/api/iterator"
    )
    
    func main() {
    	// get configuration from environment variables
    	projectID := os.Getenv("PROJECT_ID") 
    	collectionName := os.Getenv("COLLECTION_NAME") // "Papers"
    	location := os.Getenv("LOCATION")              //"us-central1"
    	modelName := os.Getenv("MODEL_NAME")           // "gemini-1.5-flash-001"
    
    	ctx := context.Background()
    
    	//initialize Vertex AI client
    	vclient, err := genai.NewClient(ctx, projectID, location)
    	if err != nil {
    		log.Fatalf("error creating vertex client: %v\n", err)
    	}
    	gemini := vclient.GenerativeModel(modelName)
    
    	//add system instructions
    	gemini.SystemInstruction = &genai.Content{
    		Parts: []genai.Part{genai.Text(`Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student.
    		`)},
    	}
    
    	// Initialize Firestore client
    	client, err := firestore.NewClient(ctx, projectID)
    	if err != nil {
    		log.Fatalf("Failed to create client: %v", err)
    	}
    	defer client.Close()
    
    	// Get documents from the collection
    	iter := client.Collection(collectionName).Documents(ctx)
    	for {
    		doc, err := iter.Next()
    		if err != nil {
    			if err == iterator.Done {
    				break
    			}
    			log.Fatalf("error iterating through documents: %v\n", err)
    		}
    
    		//create the prompt
    		prompt := genai.Text(doc.Data()["Contents"].(string))
    
    		//call the model and get back the result
    		resp, err := gemini.GenerateContent(ctx, prompt)
    		if err != nil {
    			log.Fatalf("error generating context: %v\n", err)
    		}
    
    		//print out the top candidate part in the response
    		log.Println(resp.Candidates[0].Content.Parts[0])
    	}
    
    	fmt.Println("Successfully iterated through documents!")
    }
    
    

    The code isn’t great, but the results were. I’m also getting more verbose responses from the model, which is cool. This is a much more scalable way to quickly grade all the homework.

    Grade Homework in Cloud Workflows

    I like the code solution, but maybe I want to run this preliminary grading on a scheduled basis? Every Tuesday night? I could do that with my above code, but how about using a no-code workflow engine? Our Google Cloud Workflows product recently got a Vertex AI connector. Can I make it work with the same system instructions as the above two examples? Yes, yes I can.

    I might be the first person to stitch all this together, but it works great. I first retrieved the documents from Firestore, looped through them, and called Vertex AI with the provided system instructions. Here’s the workflow’s YAML definition:

    main:
      params: [args]
      steps:
      - init:
          assign:
            - collection: ${args.collection_name}
            - project_id: ${args.project_id}
            - location: ${args.location}
            - model: ${args.model_name}
      - list_documents:
            call: googleapis.firestore.v1.projects.databases.documents.list
            args:
                collectionId: ${collection}
                parent: ${"projects/" + project_id + "/databases/(default)/documents"}
            result: documents_list
      - process_documents:
            for:
              value: document 
              in: ${documents_list.documents}
              steps:
                - ask_llm:
                    call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent
                    args: 
                        model: ${"projects/" + project_id + "/locations/" + location + "/publishers/google/models/" + model}
                        region: ${location}
                        body:
                            contents:
                                role: "USER"
                                parts:
                                    text: ${document.fields.Contents.stringValue}
                            systemInstruction: 
                                role: "USER"
                                parts:
                                    text: "Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student."
                            generation_config:
                                temperature: 0.5
                                max_output_tokens: 2048
                                top_p: 0.8
                                top_k: 40
                    result: llm_response
                - log_file_name:
                    call: sys.log
                    args:
                        text: ${llm_response}
    

    No code! I executed the workflow, passing in all the runtime arguments.

    In just a moment, I saw my workflow running, and “grades” being logged to the console. In real life, I’d probably update the Firestore document with this information. I’d also use Cloud Scheduler to run this on a regular basis.

    While I made this post about rescuing educators from the toil of grading papers, you can apply these patterns to all sorts of scenarios. Use prompt editors like Vertex AI Studio for experimentation and finding the right prompt phrasing. Then jump into code to interact with models in a repeatable, programmatic way. And consider low-code tools when model interactions are scheduled, or part of long running processes.

  • Store prompts in source control and use AI to generate the app code in the build pipeline? Sounds weird. Let’s try it!

    I can’t remember who mentioned this idea to me. It might have been a customer, colleague, internet rando, or voice in my head. But the idea was whether you could use source control for the prompts, and leverage an LLM to dynamically generate all the app code each time you run a build. That seems bonkers for all sorts of reasons, but I wanted to see if it was technically feasible.

    Should you do this for real apps? No, definitely not yet. The non-deterministic nature of LLMs means you’d likely experience hard-to-find bugs, unexpected changes on each build, and get yelled at by regulators when you couldn’t prove reproducibility in your codebase. When would you use something like this? I’m personally going to use this to generate stub apps to test an API or database, build demo apps for workshops or customer demos, or to create a component for a broader architecture I’m trying out.

    tl;dr I built an AI-based generator that takes a JSON file of prompts like this and creates all the code. I call this generator from a CI pipeline which means that I can check in (only) the prompts to GitHub, and end up with a running app in the cloud.

    {
      "folder": "generated-web",
      "prompts": [
        {
          "fileName": "employee.json",
          "prompt": "Generate a JSON structure for an object with fields for id, full name, state date, and office location. Populate it with sample data. Only return the JSON content and nothing else."
        },
        {
          "fileName": "index.js",
          "prompt": "Create a node.js program. It instantiates an employee object that looks like the employee.json structure. Start up a web server on port 8080 and expose a route at /employee return the employee object defined earlier."
        },
        {
          "fileName": "package.json",
          "prompt": "Create a valid package.json for this node.js application. Do not include any comments in the JSON."
        },
        {
          "fileName": "Dockerfile",
          "prompt": "Create a Dockerfile for this node.js application that uses a minimal base image and exposes the app on port 8080."
        }
      ]
    }
    

    In this post, I’ll walk through the steps of what a software delivery workflow such as this might look like, and how I set up each stage. To be sure, you’d probably make different design choices, write better code, and pick different technologies. That’s cool; this was mostly an excuse for me to build something fun.

    Before explaining this workflow, let me first show you the generator itself and how it works.

    Building an AI code generator

    There are many ways to build this. An AI framework makes it easier, and I chose Spring AI because I wanted to learn how to use it. Even though this is a Java app, it generates code in any programming language.

    I began at Josh Long’s second favorite place on the Internet, start.spring.io. Here I started my app using Java 21, Maven, and the Vertex AI Gemini starter, which pulls in Spring AI.

    My application properties point at my Google Cloud project and I chose to use the impressive new Gemini 1.5 Flash model for my LLM.

    spring.application.name=demo
    spring.ai.vertex.ai.gemini.projectId=seroter-project-base
    spring.ai.vertex.ai.gemini.location=us-central1
    spring.ai.vertex.ai.gemini.chat.options.model=gemini-1.5-flash-001
    

    My main class implements the CommandLineRunner interface and expects a single parameter, which is a pointer to a JSON file containing the prompts. I also have a couple of classes that define the structure of the prompt data. But the main generator class is where I want to spend some time.

    Basically, for each prompt provided to the app, I look for any local files to provide as multimodal context into the request (so that the LLM can factor in any existing code as context when it processes the prompt), call the LLM, extract the resulting code from the Markdown wrapper, and write the file to disk.

    Here are those steps in code. First I look for local files:

    //load code from any existing files in the folder
    private Optional<List<Media>> getLocalCode() {
        String directoryPath = appFolder;
        File directory = new File(directoryPath);
    
        if (!directory.exists()) {
            System.out.println("Directory does not exist: " + directoryPath);
            return Optional.empty();
        }
    
        try {
            return Optional.of(Arrays.stream(directory.listFiles())
                .filter(File::isFile)
                .map(file -> {
                    try {
                        byte[] codeContent = Files.readAllLines(file.toPath())
                            .stream()
                            .collect(Collectors.joining("\n"))
                            .getBytes();
                        return new Media(MimeTypeUtils.TEXT_PLAIN, codeContent);
                    } catch (IOException e) {
                        System.out.println("Error reading file: " + file.getName());
                        return null;
                    }
                })
                .filter(Objects::nonNull)
                .collect(Collectors.toList()));
        } catch (Exception e) {
            System.out.println("Error getting local code");
            return Optional.empty();
        }
    }
    

    I call the LLM using Spring AI, choosing one of two method depending on whether there’s any local code or not. There won’t be any code when the first prompt is executed!

    //call the LLM and pass in existing code
    private String callLlmWithLocalCode(String prompt, List<Media> localCode) {
        System.out.println("calling LLM with local code");
        var userMessage = new UserMessage(prompt, localCode);
        var response = chatClient.call(new Prompt(List.of(userMessage)));
        return extractCodeContent(response.toString());
    }
    
    //call the LLM when there's no local code
    private String callLlmWithoutLocalCode(String prompt) {
        System.out.println("calling LLM withOUT local code");
        var response = chatClient.call(prompt);
        return extractCodeContent(response.toString());
    }
    

    You see there that I’m extracting the code itself from the response string with this operation:

    //method that extracts code from the LLM response
    public static String extractCodeContent(String markdown) {
    
        System.out.println("Markdown: " + markdown);
    
        String regex = "`(\\w+)?\\n([\\s\\S]*?)```";  
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(markdown);
    
        if (matcher.find()) {
            String codeContent = matcher.group(2); // Extract group 2 (code content)
            return codeContent;
        } else {
            //System.out.println("No code fence found.");
            return markdown;
        }
    }
    

    And finally, I write the resulting code to disk:

    //write the final code to the target file path
    private void writeCodeToFile(String filePath, String codeContent) {
        try {
            File file = new File(filePath);
            if (!file.exists()) {
                file.createNewFile();
            }
    
            FileWriter writer = new FileWriter(file);
            writer.write(codeContent);
            writer.close();
    
            System.out.println("Content written to file: " + filePath);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    

    There’s some more ancillary stuff that you can check out in the complete GitHub repo with this app in it. I was happy to be using Gemini Code Assist while building this. This AI assistant helped me understand some Java concepts, complete some functions, and fix some of my subpar coding choices.

    That’s it. Once I had this component, I built a JAR file and could now use it locally or in a continuous integration pipeline to produce my code. I uploaded the JAR file to Google Cloud Storage so that I could use it later in my CI pipelines. Now, onto the day-to-day workflow that would use this generator!

    Workflow step: Set up repo and pipeline

    Like with most software projects, I’d start with the supporting machinery. In this case, I needed a source repo to hold the prompt JSON files. Done.

    And I’d also consider setting up the path to production (or test environment, or whatever) to build the app as it takes shape. I’m using Google Cloud Build for a fully-managed CI service. It’s a good service with a free tier. Cloud Build uses declarative manifests for pipelines, and this pipeline starts off the same for any type of app.

    steps:
      # Print the contents of the current directory
      - name: 'bash'
        id: 'Show source files'
        script: |
          #!/usr/bin/env bash
          ls -l
    
      # Copy the JAR file from Cloud Storage
      - name: 'gcr.io/cloud-builders/gsutil'
        id: 'Copy AI generator from Cloud Storage'
        args: ['cp', 'gs://seroter-llm-demo-tools/demo-0.0.1-SNAPSHOT.jar', 'demo-0.0.1-SNAPSHOT.jar']
    
      # Print the contents of the current directory
      - name: 'bash'
        id: 'Show source files and builder tool'
        script: |
          #!/usr/bin/env bash
          ls -l
    

    Not much to it so far. I just print out the source contents seen in the pipeline, download the AI code generator from the above-mentioned Cloud Storage bucket, and prove that it’s on the scratch disk in Cloud Build.

    Ok, my dev environment was ready.

    Workflow step: Write prompts

    In this workflow, I don’t write code, I write prompts that generate code. I might use something like Google AI Studio or even Vertex AI to experiment with prompts and iterate until I like the response I get.

    Within AI Studio, I chose Gemini 1.5 Flash because I like nice things. Here, I’d work through the various prompts I would need to generate a working app. This means I still need to understand programming languages, frameworks, Dockerfiles, etc. But I’m asking the LLM to do all the coding.

    Once I’m happy with all my prompts, I add them to the JSON file. Note that each prompt entry has a corresponding file name that I want the generator to use when writing to disk.

    At this point, I was done “coding” the Node.js app. You could imagine having a dozen or so templates of common app types and just grabbing one and customizing it quickly for what you need!

    Workflow step: Test locally

    To test this, I put the generator in a local folder with a prompt JSON file and ran this command from the shell:

    rseroter$ java -jar  demo-0.0.1-SNAPSHOT.jar --prompt-file=app-prompts-web.json
    

    After just a few seconds, I had four files on disk.

    This is just a regular Node.js app. After npm install and npm start commands, I ran the app and successfully pinged the exposed API endpoint.

    Can we do things more sophisticated? I haven’t tried a ton of scenarios, but I wanted to see if I could get a database interaction generated successfully.

    I went into the Google Cloud console and spun up a (free tier) instance of Cloud Firestore, our NoSQL database. I then created a “collection” called “Employees” and added a single document to start it off.

    Then I built a new prompts file with directions to retrieve records from Firestore. I messed around with variations that encouraged the use of certain libraries and versions. Here’s a version that worked for me.

    {
      "folder": "generated-web-firestore",
      "prompts": [
        {
          "fileName": "employee.json",
          "prompt": "Generate a JSON structure for an object with fields for id, full name, state date, and office location. Populate it with sample data. Only return the JSON content and nothing else."
        },
        {
          "fileName": "index.js",
          "prompt": "Create a node.js program. Start up a web server on port 8080 and expose a route at /employee. Initializes a firestore database using objects from the @google-cloud/firestore package, referencing Google Cloud project 'seroter-project-base' and leveraging Application Default credentials. Return all the documents from the Employees collection."
        },
        {
          "fileName": "package.json",
          "prompt": "Create a valid package.json for this node.js application using version 7.7.0 for @google-cloud/firestore dependency. Do not include any comments in the JSON."
        },
        {
          "fileName": "Dockerfile",
          "prompt": "Create a Dockerfile for this node.js application that uses a minimal base image and exposes the app on port 8080."
        }
      ]
    }
    
    

    After running the prompts through the generator app again, I got four new files, this time with code to interact with Firestore!

    Another npm install and npm start command set started the app and served up the document sitting in Firestore. Very nice.

    Finally, how about a Python app? I want a background job that actually populates the Firestore database with some initial records. I experimented with some prompts, and these gave me a Python app that I could use with Cloud Run Jobs.

    {
      "folder": "generated-job-firestore",
      "prompts": [
        {
          "fileName": "main.py",
          "prompt": "Create a Python app with a main function that initializes a firestore database object with project seroter-project-base and Application Default credentials. Add two documents to the Employees collection. Generate random id, fullname, startdate, and location data for each document. Have the start script try to call that main function and if there's an exception, prints the error."
        },
        {
          "fileName": "requirements.txt",
          "prompt": "Create a requirements.txt file for the packages used by this app"
        },
        {
          "fileName": "Procfile",
          "prompt": "Create a Procfile for python3 that starts up main.py"
        },
        {
          "fileName": "Dockerfile",
          "prompt": "Create a Dockerfile for this Python batch application that uses a minimal base image and doesn't expose any ports"
        }
      ]
    }
    

    Running this prompt set through the AI generator gave me the valid files I wanted. All my prompt files are here.

    At this stage, I was happy with the local tests and ready to automate the path from source control to cloud runtime.

    Workflow step: Generate app in pipeline

    Above, I had started the Cloud Build manifest with the step of yanking down the AI generator JAR file from Cloud Storage.

    The next step is different for each app we’re building. I could use substitution variables in Cloud Build and have a single manifest for all of them, but for demonstration purposes, I wanted one manifest per prompt set.

    I added this step to what I already had above. It executes the same command in Cloud Build that I had run locally to test the generator. First I do an apt-get on the “ubuntu” base image to get the Java command I need, and then invoke my JAR, passing in the name of the prompt file.

    ...
    
    # Run the JAR file
      - name: 'ubuntu'
        id: 'Run AI generator to create code from prompts'
        script: |
          #!/usr/bin/env bash
          apt-get update && apt-get install -y openjdk-21-jdk
          java -jar  demo-0.0.1-SNAPSHOT.jar --prompt-file=app-prompts-web.json
    
      # Print the contents of the generated directory
      - name: 'bash'
        id: 'Show generated files'
        script: |
          #!/usr/bin/env bash
          ls ./generated-web -l
    

    I updated my Cloud Build pipeline that’s connected to my GitHub repo with an updated YAML manifest.

    Running the pipeline at this point showed that the generator worked correctly and adds the expected files to the scratch volume in the pipeline. Awesome.

    At this point, I had an app generated from prompts found in GitHub.

    Workflow step: Upload artifact

    Next up? Getting this code into a deployable artifact. There are plenty of options, but I want to use a container-based runtime, and need a container image. Cloud Build makes that easy.

    I added another section to my existing Cloud Build manifest to containerize with Docker and upload to Artifact Registry.

     # Containerize the code and upload to Artifact Registry
      - name: 'gcr.io/cloud-builders/docker'
        id: 'Containerize generated code'
        args: ['build', '-t', 'us-west1-docker.pkg.dev/seroter-project-base/ai-generated-images/generated-web:latest', './generated-web']
      - name: 'gcr.io/cloud-builders/docker'
        id: 'Push container to Artifact Registry'
        args: ['push', 'us-west1-docker.pkg.dev/seroter-project-base/ai-generated-images/generated-web']
    

    It used the Dockerfile our AI generator created, and after this step ran, I saw a new container image.

    Workflow step: Deploy and run app

    The final step, running the workload! I could use our continuous deployment service Cloud Deploy but I took a shortcut and deployed directly from Cloud Build. This step in the Cloud Build manifest does the job.

      # Deploy container image to Cloud Run
      - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
        id: 'Deploy container to Cloud Run'
        entrypoint: gcloud
        args: ['run', 'deploy', 'generated-web', '--image', 'us-west1-docker.pkg.dev/seroter-project-base/ai-generated-images/generated-web', '--region', 'us-west1', '--allow-unauthenticated']
    

    After saving this update to Cloud Build and running it again, I saw all the steps complete successfully.

    Most importantly, I had an active service in Cloud Run that served up a default record from the API endpoint.

    I went ahead and ran a Cloud Build pipeline for the “Firestore” version of the web app, and then the background job that deploys to Cloud Run Jobs. I ended up with two Cloud Run services (web apps), and one Cloud Run Job.

    I executed the job, and saw two new Firestore records in the collection!

    To prove that, I executed the Firestore version of the web app. Sure enough, the records returned include the two new records.

    Wrap up

    What we saw here was a fairly straightforward way to generate complete applications from nothing more than a series of prompts fed to the Gemini model. Nothing prevents you from using a different LLM, or using other source control, continuous integration, and hosting services. Just do some find-and-replace!

    Again, I would NOT use this for “real” workloads, but this sort of pattern could be a powerful way to quickly create supporting apps and components for testing or learning purposes.

    You can find the whole project here on GitHub.

    What do you think? Completely terrible idea? Possibly useful?

  • Here’s what I’d use to build a generative AI application in 2024

    Here’s what I’d use to build a generative AI application in 2024

    What exactly is a “generative AI app”? Do you think of chatbots, image creation tools, or music makers? What about document analysis services, text summarization capabilities, or widgets that “fix” your writing? These all seem to apply in one way or another! I see a lot written about tools and techniques for training, fine-tuning, and serving models, but what about us app builders? How do we actually build generative AI apps without obsessing over the models? Here’s what I’d consider using in 2024. And note that there’s much more to cover besides just building—think designing, testing, deploying, operating—but I’m just focusing on the builder tools today.

    Find a sandbox for experimenting with prompts

    A successful generative AI app depends on a useful model, good data, and quality prompts. Before going to deep on the app itself, it’s good to have a sandbox to play in.

    You can definitely start with chat tools like Gemini and ChatGPT. That’s not a bad way to get your hands dirty. There’s also a set of developer-centric surfaces such as Google Colab or Google AI Studio. Once you sign in with a Google ID, you get free access to environments to experiment.

    Let’s look at Google AI Studio. Once you’re in this UI, you have the ability to simulate a back-and-forth chat, create freeform prompts that include uploaded media, or even structured prompts for more complex interactions.

    If you find yourself staring at an empty console wondering what to try, check out this prompt gallery that shows off a lot of unique scenarios.

    Once you’re doing more “serious” work, you might upgrade to a proper cloud service that offers a sandbox along with SLAs and prompt lifecycle capabilities. Google Cloud Vertex AI is one example. Here, I created a named prompt.

    With my language prompts, I can also jump into a nice “compare” experience where I can try out variations of my prompt and see if the results are graded as better or worse. I can even set one as “ground truth” used as a baseline for all comparisons.

    Whatever sandbox tools you use, make sure they help you iterate quickly, while also matching the enterprise-y needs of the use case or company you work for.

    Consume native APIs when working with specific models or platforms

    At this point, you might be ready to start building your generative AI app. There seems to be a new, interesting foundation model up on Hugging Face every couple of days. You might have a lot of affection for a specific model family, or not. If you care about the model, you might choose the APIs for that specific model or provider.

    For example, let’s say you were making good choices and anchored your app to the Gemini model. I’d go straight to the Vertex AI SDK for Python, Node, Java, or Go. I might even jump to the raw REST API and build my app with that.

    If I were baking a chat-like API call into my Node.js app, the quickest way to get the code I need is to go into Vertex AI, create a sample prompt, and click the “get code” button.

    I took that code, ran it in a Cloud Shell instance, and it worked perfectly. I could easily tweak it for my specific needs from here. Drop this code into a serverless function, Kubernetes pod, or VM and you’ve got a working generative AI app.

    You could follow this same direct API approach when building out more sophisticated retrieval augmented generation (RAG) apps. In a Google Cloud world, you might use the Vertex AI APIs to get text embeddings. Or you could choose something more general purpose and interact with a PostgreSQL database to generate, store, and query embeddings. This is an excellent example of this approach.

    If you have a specific model preference, you might choose to use the API for Gemini, Llama, Mistral, or whatever. And you might choose to directly interact with database or function APIs to augment the input to those models. That’s cool, and is the right choice for many scenarios.

    Use meta-frameworks for consistent experiences across models and providers

    As expected, the AI builder space is now full of higher-order frameworks that help developers incorporate generative AI into their apps. These frameworks help you call LLMs, work with embeddings and vector databases, and even support actions like function calling.

    LangChain is a big one. You don’t need to be bothered with many model details, and you can chain together tasks to get results. It’s for Python devs, so your choice is either to use Python, or, embrace one of the many offshoots. There’s LangChain4J for Java devs, LangChain Go for Go devs, and LangChain.js for JavaScript devs.

    You have other choices if LangChain-style frameworks aren’t your jam. There’s Spring AI, which has a fairly straightforward set of objects and methods for interacting with models. I tried it out for interacting with the Gemini model, and almost found it easier to use than our native API! It takes one update to my POM file:

    <dependency>
    			<groupId>org.springframework.ai</groupId>
    			<artifactId>spring-ai-vertex-ai-gemini-spring-boot-starter</artifactId>
    </dependency>
    

    One set of application properties:

    spring.application.name=demo
    spring.ai.vertex.ai.gemini.projectId=seroter-dev
    spring.ai.vertex.ai.gemini.location=us-central1
    spring.ai.vertex.ai.gemini.chat.options.model=gemini-pro-vision
    

    And then an autowired chat object that I call from anywhere, like in this REST endpoint.

    @RestController
    @SpringBootApplication
    public class DemoApplication {
    
    	public static void main(String[] args) {
    		SpringApplication.run(DemoApplication.class, args);
    	}
    
    	private final VertexAiGeminiChatClient chatClient;
    
    	@Autowired
        public DemoApplication(VertexAiGeminiChatClient chatClient) {
            this.chatClient = chatClient;
        }
    
    	@GetMapping("/")
    	public String getGeneratedText() {
    		String generatedResponse = chatClient.call("Tell me a joke");
    		return generatedResponse;
    	}
    }
    

    Super easy. There are other frameworks too. Use something like AI.JSX for building JavaScript apps and components. BotSharp is a framework for .NET devs building conversational apps with LLMs. Hugging Face has frameworks that help you abstract the LLM, including Transformers.js and agents.js.

    There’s no shortage of these types of frameworks. If you’re iterating through LLMs and want consistent code regardless of which model you use, these are good choices.

    Create with low-code tools when available

    If I had an idea for a generative AI app, I’d want to figure out how much I actually had to build myself. There are a LOT of tools for building entire apps, components, or widgets, and many require very little coding.

    Everyone’s in this game. Zapier has some cool integration flows. Gradio lets you expose models and APIs as web pages. Langflow got snapped up by DataStax, but still offers a way to create AI apps without much required coding. Flowise offers some nice tooling for orchestration or AI agents. Microsoft’s Power Platform is useful for low-code AI app builders. AWS is in the game now with Amazon Bedrock Agents. ServiceNow is baking generative AI into their builder tools, Salesforce is doing their thing, and basically every traditional low-code app vendor is playing along. See OutSystems, Mendix, and everyone else.

    As you would imagine, Google does a fair bit here as well. The Vertex AI Agent Builder offers four different app types that you basically build through point-and-click. These include personalized search engines, chat, recommendation engine, and connected agents.

    Search apps can tap into a variety of data sources including crawled websites, data warehouses, relational databases, and more.

    What’s fairly new is the “agent app” so let’s try building one of those. Specifically, let’s say I run a baseball clinic (sigh, someday) and help people tune their swing in our batting cages. I might want a chat experience for those looking for help with swing mechanics, and then also offer the ability to book time in the batting cage. I need data, but also interactivity.

    Before building the AI app, I need a Cloud Function that returns available times for the batting cage.

    This Node.js function returns an array of book-able timeslots. I’ve hard-coded the data, but you get the idea.

    I also jumped into the Google Cloud IAM interface to ensure that the Dialogflow service account (which the AI agent operates as) has permission to invoke the serverless function.

    Let’s build the agent. Back in the Vertex AI Agent Builder interface, I choose “new app” and pick “agent.”

    Now I’m dropped into the agent builder interface. On the left, I have navigation for agents, tools, test cases, and more. In the next column, I set the goal of the agent, the instructions, and any tools I want to use with the agent. On the right, I preview my agent.

    I set a goal of “Answer questions about baseball and let people book time in the batting cage” and then get to the instructions. There’s a “sample” set of instructions that are useful for getting started. I used those, but removed references to other agents or tools, as we don’t have that yet.

    But now I want to add a tool, as I need a way to show available booking times if the user asks. I have a choice of adding a data store—this is useful if you want to source Q&A from a BigQuery table, crawl a website, or get data from an API. I clicked the “manage all tools” button and chose to add a new tool. Here I give the tool a name, and very importantly, a description. This description is used by the AI agent to figure out when to invoke it.

    Because I chose OpenAPI as the tool type, I need to provide an OpenAPI spec for my Cloud Function. There’s a sample provided, and I used that to put together my spec. Note that the URL is the function’s base URL, and the path contains the specific function name.

    {
        "openapi": "3.0.0",
        "info": {
            "title": "Cage API",
            "version": "1.0.0"
        },
        "servers": [
            {
                "url": "https://us-central1-seroter-anthos.cloudfunctions.net"
            }
        ],
        "paths": {
            "/function-get-cage-times": {
                "get": {
                    "summary": "List all open cage times",
                    "operationId": "getCageTimes",
                    "responses": {
                        "200": {
                            "description": "An array of cage times",
                            "content": {
                                "application/json": {
                                    "schema": {
                                        "type": "array",
                                        "items": {
                                            "$ref": "#/components/schemas/CageTimes"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "components": {
            "schemas": {
                "CageTimes": {
                    "type": "object",
                    "required": [
                        "cageNumber",
                        "openSlot",
                        "cageType"
                    ],
                    "properties": {
                        "cageNumber": {
                            "type": "integer",
                            "format": "int64"
                        },
                        "openSlot": {
                            "type": "string"
                        },
                        "cageType": {
                            "type": "string"
                        }
                    }
                }
            }
        }
    }
    

    Finally, in this “tool setup” I define the authentication to that API. I chose “service agent token” and because I’m calling a specific instance of a service (versus the platform APIs), I picked “ID token.”

    After saving the tool, I go back to the agent definition and want to update the instructions to invoke the tool. I use the syntax, and appreciated the auto-completion help.

    Let’s see if it works. I went to the right-hand preview pane and asked it a generic baseball question. Good. Then I asked it for open times in the batting cage. Look at that! It didn’t just return a blob of JSON; it parsed the result and worded it well.

    Very cool. There are some quirks with this tool, but it’s early, and I like where it’s going. This was MUCH simpler than me building a RAG-style or function-calling solution by hand.

    Summary

    The AI assistance and model building products get a lot of attention, but some of the most interesting work is happening in the tools for AI app builders. Whether you’re experimenting with prompts, coding up a solution, or assembling an app out of pre-built components, it’s a fun time to be developer. What products, tools, or frameworks did I miss from my assessment?

  • How I’d use generative AI to modernize an app

    How I’d use generative AI to modernize an app

    I’m skeptical of anything that claims to make difficult things “easy.” Easy is relative. What’s simple for you might draw blood from me. And in my experience, when a product claims to make something “easy”, it’s talking about simplifying a subset of the broader, more complicated job-to-be-done.

    So I won’t sit here and tell you that generative AI makes app modernization easy. Nothing does. It’s hard work and is as much about technology as it is psychology and archeology. But AI can make it easier. We’ll take any help we can get, right? I count at least five ways I’d use generative AI to make smarter progress on my modernization journey.

    #1 Understand the codebase

    Have you been handed a pile of code and scripts before? Told to make sense of it and introduce some sort of feature enhancement? You might spend hours, days, or weeks figuring out the relationships between components and side effects of any changes.

    Generative AI is fairly helpful here. Especially now that things like Gemini 1.5 (with its 1 million token input) exist.

    I might use something like Gemini (or ChatGPT, or whatever) to ask questions about the code base and get ideas for how something might be used. This is where the “generative” part is handy. When I use the Duet AI assistance in to explain SQL in BigQuery, I get back a creative answer about possible uses for the resulting data.

    In your IDE, you might use Duet AI (or Copilot, Replit, Tabnine) to give detailed explanations of individual code files, shell scripts, YAML, or Dockerfiles. Even if you don’t decide to use any generative AI tools to write code, consider using them to explain it.

    #2 Incorporate new language/framework features

    Languages themselves modernize at a fairly rapid pace. Does your codebase rely on a pattern that was rad back in 2011? It happens. I’ve seen that generative AI is a handy way to modernize the code itself while teaching us how to apply the latest language features.

    For instance, Go generics are fairly new. If your Go app is more than 2 years old, it wouldn’t be using them. I could go into my Go app and ask my generative AI chat tool for advice on how to introduce generics to my existing code.

    Usefully, the Duet AI tooling also explains what it did, and why it matters.

    I might use the same types of tools to convert an old ASP.NET MVC app to the newer Minimal APIs structure. Or replace deprecated features from Spring Boot 3.0 with more modern alternatives. Look at generative AI tools as a way to bring your codebase into the current era of language features.

    #3 Improve code quality

    Part of modernizing an app may involve adding real test coverage. You’ll never continuously deploy an app if you can’t get reliable builds. And you won’t get reliable builds without good tests and a CI system.

    AI-assisted developer tools make it easier to add integration tests to your code. I can go into my Spring Boot app and get testing scaffolding for my existing functions.

    Consider using generative AI tools to help with broader tasks like defining an app-wide test suite. You can use these AI interfaces to brainstorm ideas, get testing templates, or even generate test data.

    In addition to test-related activities, you can use generative AI to check for security issues. These tools don’t care about your feelings; here, it’s calling out my terrible practices.

    Fortunately, I can also ask the tool to “fix” the code. You might find a few ways to use generative AI to help you refactor and improve the resilience and quality of the codebase.

    #4 Swap out old or unsupported components

    A big part of modernization is ensuring that a system is running fully supported components. Maybe that database, plugin, library, or entire framework is now retired, or people don’t want to work with it. AI tools can help with this conversion.

    For instance, maybe it’s time to swap out JavaScript frameworks. That app you built in 2014 with Backbone.js or jQuery is feeling creaky. You want to bring in React or Angular instead. I’ve had some luck coaxing generative AI tools into giving me working versions of just that. Even if you use AI chat tools to walk you through the steps (versus converting all the code), it’s a time-saver.

    The same may apply to upgrades from Java 8 to Java 21, or going from classic .NET Framework to modern .NET. Heck, you can even have some luck switching from COBOL to Go. I wouldn’t blindly trust these tools to convert code; audit aggressively and ensure you understand the new codebase. But these tools may jump start your work and cut out some of the toil.

    #5 Upgrade the architecture

    Sometimes an app modernization requires some open-heart surgery. It’s not about light refactoring or swapping a frontend framework. No, there are times where you’re yanking out major pieces or making material changes.

    I’ve had some positive experiences asking generative AI tools to help me upgrade a SOAP service to REST. Or REST to gRPC. You might use these tools to switch from a stored procedure-heavy system to one that puts the logic into code components instead. Speaking of databases, you could change from MySQL to Cloud Spanner, or even change a non-relational database dependency back to a relational one. Will generative AI do all the work? Probably not, but much of it’s pretty good.

    This might be a time to make bigger changes like swapping from one cloud to another, or adding a major layer of infrastructure-as-code templates to your system. I’ve seen good results from generative AI tools here too. In some cases, a modernization project is your chance to introduce real, lasting changes to a architecture. Don’t waste the opportunity!

    Wrap Up

    Generative AI won’t eliminate the work of modernizing an app. There’s lots of work to do to understand, transform, document, and rollout code. AI tools can make a big difference, though, and you’re tying a hand behind your back if you ignore it! What other uses for app modernization come to mind?