Category: Cloud

  • Stop following tutorials and learn by building (with Antigravity) instead

    Don’t get me wrong, I like a good tutorial. Might be in a blog, book, video, or training platform. I’ve probably created a hundred (including in dozens of Pluralsight courses) and consumed a thousand. But lately? I don’t like be constrained by the author’s use case, and I wonder if all I’ve learned how to do is follow someone else’s specific instructions.

    This popped for me twice in the past few days as I finally took some “should try” technologies off my backlog. Instead of hunting for a hello-world tutorial to show me a few attributes of Angular Signals, I simply built a demo app using Google Antigravity. No local infrastructure setup, wrangling with libraries, or figuring out what machinery I needed to actually see the technology in action.

    I did it again a couple of days later! The Go version of the Agent Development Kit came out a bit ago. I’ve been meaning to try it. The walkthrough tutorials are fine, but I wanted something more. So, I just built a working solution right away.

    I still enjoy reading content about how something works. That doesn’t go away. And truly deep learning still requires more than vibe coding an app. But I’m not defaulting to tutorials any more. Instead, I can just feed them into the LLM and build something personalized for me. Here’s an example.

    Take the cases above. I jumped into Google AI Studio to get inspiration on interesting async agent use cases. I liked this one. Create a feed where an agent picks up a news headline and then does some research into related stories before offering some analysis. It’ll read from a queue, and then drop analysis to a Cloud Storage bucket.

    Prompting for agent ideas in Google AI Studio

    With my use case in hand, I jumped into Antigravity to sketch out a design. Notice that I just fed the tutorial link into Antigravity to ensure it’d get seeded with up-to-date info for this new library.

    Sparking an application build in Google Antigravity

    Antigravity started whirring away on creating implementation plans and a task list. Because I can comment on its plans and iterate on the ideas before building begins, I’m not stressed about making the first prompt perfect. Notice here that it flags a big assumption, so I provided a comment confirming that I want a JSON payload for this background worker.

    Google Antigravity provides a way to offer feedback on implementation plans

    After Antigravity started building, I noticed the generated code used a package the IDE flagged as deprecated. I popped into the chat (or I could have commented in the task list) and directed the AI tool to use the latest version and ensure the code still built successfully.

    Steering Google Antigravity to pick a newer version of the library it selected

    Constantly, I’m focused on the outcomes I’m after, not the syntax of agent building. It’s refreshing. When reviewing the code, I started to realize I wanted more data in the incoming payload. A directive later, and my code reflected it.

    Iterating on software with Antigravity

    This started with me wanting to learn ADK for Go. It was easy to review the generated agent code, ask Antigravity questions about it, and see “how” to do it all without typing it all out myself. Will it stick in my brain as much as if I wrote it myself? No. But that wasn’t my goal. I wanted to fit ADK for Go into a real use case.

    Code that sets up an agent in ADK for Go

    This solution should feel “real”, and not just be a vibe-and-go. How about using CI/CD? I never remember the syntax for Google Cloud Build, and getting my pipeline right can swallow up half my dev time. No problem.

    I express my intent for a Cloud Build pipeline, and moments later I have a fully valid YAML definition, along with a generated Dockerfile.

    Antigravity created a Google Cloud Build pipeline for me

    Next I asked Antigravity to add a deployment step so that the container image is pushed to a Cloud Run worker pool after a successful build. I needed to point Antigravity to a tutorial for worker pools for it to know about this new feature.

    I’m using an API key in this solution, and didn’t want that stored as a regular environment variable or visible during deployment. Vibe coding doesn’t have to be insecure. I asked Antigravity to come up with a better way. It chose Google Cloud Secret Manager, gave me the commands to issue, and showed me what the Cloud Run deployment command would now look like.

    Getting a proper, security-conscious deployment command for Cloud Run worker pools

    I then told Antigravity to introduce this updated Cloud Run command to complete the build + deploy pipeline.

    A complete, AI-generated CI/CD pipeline for my agentic app

    Amazing! I wanted to test this out before putting an Angular frontend into the solution. Antigravity reminded my of the right way to format a Cloud Build command given the substitution variables and I was off.

    Building and deploying this agent with Cloud Build

    Within a few minutes, I had a container image in Artifact Registry, and a Cloud Run worker pool listening for work.

    My running workload in Cloud Run worker pools

    To test it out, I needed to publish a message to Google Cloud Pub/Sub. Antigravity gave me a sample JSON message structure that agent expected to receive. I went to Techmeme.com to grab a recent news headline as my source. Pub/Sub has a UI for manually sending a message into a Topic, so I used that.

    Publishing a message to Pub/Sub to trigger my agent

    After a moment, I saw a new JSON doc in my Cloud Storage bucket. Opening it up revealed a set of related news, and some interesting insights.

    News analysis performed by my ADK agent and dropped into a Cloud Storage bucket

    I also wanted to see more of Angular Signals in action, so I started a new project and prompted Antigravity to build out a site where I could submit news stories to my Pub/Sub topic. Once again, I passed in a reference guide into my prompt as context.

    Prompting Antigravity to create a frontend app using Angular Signals

    I asked Antigravity to show me how Angular Signals were used, and even asked it to sketch a diagram of the interaction. This is a much better way to learn a feature than hoping a static tutorial covers everything!

    The first build turned out ok, but I wanted better handling of the calls to Google Cloud Pub/Sub. Specifically, I wanted this executed server side and after adding a comment to the the implementation plan, Antigravity came up with a backend-for-frontend pattern.

    Directing Antigravity to support a backend for calls to Google Cloud

    After a couple of iterations on look-and-feel, and one debugging session which revealed I was using the wrong Pub/Sub topic name, I had a fully working app.

    After starting the server side component and the frontend component, I viewed my app interface.

    The interface Antigravity built

    Grabbing another headline from Techmeme gave me a chance to try this out. Angular Signals seems super smooth.

    Adding a headline and seeing a dynamic frontend in action

    Once again, my Cloud Storage bucket had some related links and analysis generated by ADK agent sitting in Cloud Run worker pools.

    News analysis initiated from my Angular app

    It took my longer to write this post than it did to build a fully working solution. How great is that?

    For me, tutorials are now LLM input only. They’re useful context for LLMs teaching me things or building apps with my direction. How about you?

  • Go from prompt to production using a set of AI tools, or just one (Google Antigravity)

    Go from prompt to production using a set of AI tools, or just one (Google Antigravity)

    We’ve passed the first phase of AI dev tooling. When I first saw AI-assisted code completion and generation, I was wow-ed. Still am. Amazing stuff. Then agentic coding environments went a step further. We could generate entire apps with products like Replit or v0! Following that, we all got new types of agentic IDEs, CLIs, background coding agents, and more. With all these options, there isn’t just one way to work with AI in software engineering.

    I’m noticing that I’m using AI tools to command (perform actions on my environment or codebase), to code (write or review code), and to conduct (coordinate agents who work on my behalf). Whether these are done via separate tools or the same one, this seems like a paradigm that will persist for a while.

    Let’s see this in action. I’ll first do this with a set of popular tools—Google AI Studio, Gemini CLI, Gemini Code Assist, and Jules—and then do the same exercise with the new Google Antigravity agent-first development platform.

    Architecture diagram generated with nano-banana

    I’ve accepted that I’ll never be a professional baseball player. It’s just not in the cards. But can I use AI to help me pretend that I played! Let’s build an application that uses AI to take an uploaded picture and generate images of that person in various real-life baseball situations.

    Build with a set of AI tools

    Gemini 3 Pro is excellent at frontend code and Google AI Studio is a fantastic way to get started building my app. I went to the “Build” section where I could provide a natural language prompt to start vibe-coding my baseball app. Here’s an example of “commanding” with AI tools.

    Google AI Studio

    After a few seconds of thinking, I saw a stash of files created for my application. Then a preview popped up that I could actually interact with.

    Vibe coded app in Google AI Studio

    Jeez, only one prompt and I have an awesome AI app. How cool is that? The Nano Banana model is just remarkable.

    Now I wanted to do more with this app and bring it into my IDE to make some updates before deploying it. In the top right of the screen, there’s a GitHub icon. After I clicked that, I was asked to authenticate with my GitHub account. Next, I had to provide details about which repo to create for this new codebase.

    Create GitHub repo from Google AI Studio

    Then Google AI Studio showed me all the changes it made in the local repo. I get one last chance to review things before staging and committing the changes.

    Push changes to GitHub

    A moment later, I had a fully populated GitHub repo. This gave me the intermediate storage I needed to pick up and continue with my IDE and agentic CLI.

    Vibe coded app code in my GitHub repo

    I jumped into Visual Studio Code with the installed Gemini Code Assist plugin. I’ve also got the Gemini CLI integration set up, so everything is all in one place.

    Visual Studio Code with Gemini Code Assist and the Gemini CLI

    Here, I can command and code my way to a finished app. I could ask (command) for a summary of the application itself and how it’s put together. But even more useful, I issued a command asking for how this app was authenticating with the Gemini API.

    Gemini Code Assist helping me understand the codebase

    Very helpful! Notice that it found a config file that shows a mapping from GEMINI_API_KEY (which is the environment variable I need to set) to the API_KEY referred to in code. Good to know.

    Here’s where I could continue to code my way through the app with AI assistance if there were specific changes I felt like making ahead of deploying it. I wrote a mix of code (and used the Gemini CLI) to add a Node server to serve this static content and access the environment variable from the runtime.

    Let’s do some conducting. I didn’t feel like writing up a whole README and wanted some help from AI. Here’s where Jules comes in, and its extension for the Gemini CLI. Notice that I have Gemini CLI extensions for Jules and Cloud Run already installed.

    Two MCP servers added to the Gemini CLI

    I can go ahead and ask Jules to create a better README, and then continue on my work. Agents working on my behalf!

    Using the Gemini CLI to trigger a background task in Jules

    After doing some other work, I came back and checked the status of the Jules job (/jules status) and saw that the task was done. The Jules extension asked me if I wanted a new branch, or to apply the changes locally. I chose the former option and reviewed the PR before merging.

    Reviewing a branch with a README updated by Jules

    Finally, I was ready to deploy this to Google Cloud Run. Here, I also used a command approach and instructed the Gemini CLI to deploy this app with the help of the extension for Cloud Run.

    Using a natural language request from me, the Gemini CLI crafted the correct gcloud CLI command to deploy my app.

    Doing a deployment to Cloud Run from the Gemini CLI

    That finished in a few seconds, and I had my vibe-coded app, with some additional changes, deployed and running in Google Cloud.

    App running on Google Cloud

    So we commanded Google AI Studio to build the fundamentals of the app, used Gemini Code Assist and the Gemini CLI to code and command towards deployment, and Jules to conduct background agents on our behalf. Not particularly difficult, and the handoffs via a Git repo worked well.

    This process works great if you have distinct roles with handoffs (designer –> developer –> deployment team) or want to use distinct products at each stage.

    Build with Google Antigravity

    Google Antigravity isn’t a code editor. It’s not an IDE. It’s something more. Yes, you can edit code and do classic IDE things. What’s different is that it’s agent-first, and supports a rich set of surfaces in a single experience. I can kick off a series of agents to do work, trigger Computer Use in a dedicated browser, and extend behavior through MCP servers. Basically, I can do everything I did above, but within a single experience.

    Starting point with Google Antigravity

    I fed it the same prompt I gave to Google AI Studio. Immediately, Google Antigravity got to work building an implementation plan.

    Giving a prompt to Antigravity to build out an application

    I love that I can review this implementation plan, and add comments to sections I want to update. This feels like a very natural way to iterate on this specification. Right away, I asked for Node server host for this app, and am building it that way from the start.

    Implementation Plan, with comments

    The AI agent recognizes my comments and refreshes its plans.

    Antigravity using the Implementation Plan to begin its work

    At this point, the agent is rolling. It built out the entire project structure, created all the code files, and plowed through its task list. Yes, it creates and maintains a task list so we can track what’s going on.

    Task List maintained by Antigravity

    The “Agent Manager” interface is wild. From here I can see my inbox of agent tasks, and monitor what my agents are currently doing. This one is running shell commands.

    Agent Manager view for triggering and managing agent work

    The little “drawer” at the bottom of the main chat window also keeps tabs on what’s going on across all the various agents. Here I could see what docs need my attention, which processes are running (e.g. web servers), and which artifacts are part of the current conversation.

    View of processes, documents, and conversation artifacts

    The whole app building processed finished in just a few minutes. It looked good! And because Google Antigravity has built-in support for Computer Use with a Chrome browser, it launched a browser instance and showed me how the app worked. I can also prompt Computer Use interactions any time via chat.

    Computer Use driving the finished application

    Antigravity saved the steps it followed into an artifact called Walkthrough. Including a screenshot!

    Generated walkthrough including screenshots

    How about fixing the README? In the previous example, I threw that to a background task in Jules. I could still do that here, but Antigravity is also adept at doing asynchronous work. I went into the Agent Manager and asked for a clean README with screenshots and diagrams. Then I closed Agent Manager and did some other things. Never breaking flow!

    Triggering a background agent to update the README

    Later, I noticed that the work was completed. The Agent Manager showed me what it did, and gave me a preview of the finished README. Nice job.

    Finished README with diagrams and screenshots

    I wanted to see the whole process through, so how about using Google Antigravity to deploy this final app to Google Cloud Run?

    This product also supports extension via MCP. During this product preview, it comes with a couple dozen MCP servers in the “MCP Store.” These include ones for Google products, Figma, GitHub, Stripe, Notion, Supabase, and more.

    MCP servers available out of the box

    We don’t yet include one for Cloud Run, but I can add that myself. The “manage MCP servers” is empty to start, but it shows you the format you need to add to the configuration file. I added the configuration for the local Cloud Run MCP server.

    Configuration for the Cloud Run MCP server

    After saving that configuration, I refreshed the “manage MCP servers” screen and saw all the tools at my disposal.

    Tools available from the Cloud Run MCP server

    Sweet! I went back to the chat window and asked Google Antigravity to deploy this app to Cloud Run.

    Antigravity deploying the app to Google Cloud Run

    The first time, the deployment failed but Google Antigravity picked up the error and updated the app to start on the proper port and tweak how it handled wildcard paths. It then redeployed, and worked.

    Chat transcript of attempt to deploy to Google Cloud Run

    Fantastic. Sure enough, browsing the URL showed my app running and working flawlessly. Without a doubt, this would have been hours or days of work for me. Especially on the frontend stuff since I’m terrible at it. Instead, the whole process took less than an hour.

    Finished application running in Google Cloud Run

    I’m very impressed! For at least the next few years, software engineering will likely include a mix of commands, coding, and conducting. As I showed you here, you can do that with distinct tools that enable distinct stages and offer one or more of those paradigms. Products like Google Antigravity offer a fresh perspective, and make it possible to design, build, optimize, and deploy all from one product. And I can now seamlessly issue commands, write code, and conduct agents without ever breaking flow. Pretty awesome.

  • How to build and deploy a portable AI agent that uses a managed memory service

    How to build and deploy a portable AI agent that uses a managed memory service

    I enjoy building with new frameworks and services. Do you? It’s fun to break new ground. That said, I’m often filled with regret as a I navigate incomplete docs, non-existent search results, and a dearth of human experts to bother. Now add LLMs that try to help but accidentally set you back. Good times. But we persevere. My goal? Build an AI agent—it helps you plan a career change—that retains memory through long-running conversations, and is portable enough that it can run on most any host. Easy enough, yes?

    My weapons of choice were the Agent Development Kit (Python), the new fully-managed Vertex AI Memory Bank service, and runtime hosts including Google Cloud Run and Vertex AI Agent Engine. Most every sample I found for this tech combination was either PhD level coding with excessive functionality, a hard-coded “hello world” that didn’t feel realistic, or a notebook-like flow that didn’t translate to an independent agent. I craved a simple, yet complete, example of what a real, hosted, and memory-infused agent looks like. I finally got it all working, it’s very cool, and wanted to share steps to reproduce it.

    Vertex AI Memory Bank showing memories from my AI agent

    Let’s go through this step by step, and I’ll explain the various gotchas and such that weren’t clear from the docs or existing samples. Note that I am NOT a Python developer, but I think I follow some decent practices here.

    First, I wanted a new Python virtual environment for the folder containing my app.

     python3 -m venv venv
    source venv/bin/activate
    

    I installed the latest version of the Google ADK.

    pip install google-adk
    

    My source code is here, so you can just download the requirements.txt file and install the local dependencies you need.

    pip install -r requirements.txt
    

    I’ve got an __init__.py file that simply contains:

    from . import agent
    

    Now the agent.py itself where all the logic lives. Let’s go step by step, but this all is from a single file.

    import os
    import sys
    from google.adk.agents import Agent
    from google.adk.tools import agent_tool
    from google.adk.tools import google_search
    
    from google import adk
    from google.adk.runners import Runner
    from google.adk.sessions import VertexAiSessionService
    from google.adk.memory import VertexAiMemoryBankService
    from google.api_core import exceptions
    

    Nothing earth-shattering here. But I use a mix of built-in tools including Google Search. And I’m using durable storage for sessions and memory (versus the default in-memory options) and importing those references.

    app_name = 'career_agent'
    
    # Retrieve the agent engine ID needed for the memory service
    agent_engine_id = os.environ.get("GOOGLE_CLOUD_AGENT_ENGINE_ID")
    

    Our agent app needs a name for the purpose of storing sessions and memory through ADK. And that agent_engine_id is important for environments where it’s not preloaded (e.g. outside of Vertex AI Agent Engine).

    # Create a durable session for our agent
    session_service = VertexAiSessionService()
    print("Vertex session service created")
    
    # Instantiate the long term memory service, needs agent_engine parameter from environment or doesn't work right
    memory_service = VertexAiMemoryBankService(
        agent_engine_id=agent_engine_id)
    print("Vertex memory service created")
    

    Here I create instances of the VertexAiSessionService and VertexAiMemoryBankService. These refer to fully managed, no ops needed, services that you can use standalone wherever your agent runs.

    # Use for callback to save the session info to memory
    async def auto_save_session_to_memory_callback(callback_context):
        try:
            await memory_service.add_session_to_memory(
                callback_context._invocation_context.session
            )
            print("\n****Triggered memory generation****\n")
        except exceptions.GoogleAPICallError as e:
            print(f"Error during memory generation: {e}")
    

    Now we’re getting somewhere. This function (thanks to my colleague Megan who I believe came up with it) will be invoked as a callback during session turns.

    # Agent that does Google search
    career_search_agent_memory = Agent(
        name="career_search_agent_memory",
        model="gemini-2.5-flash",
        description=(
            "Agent answers questions career options for a given city or country"
        ),
        instruction=(
            "You are an agent that helps people figure out what types of jobs they should consider based on where they want to live."
        ),
        tools=[google_search],
    )
    

    That’s agent number one. It’s a secondary agent that just does a real-time search to supplement the LLM’s knowledge with real data about a given job in a particular city.

    # Root agent that retrieves memories and saves them as part of career plan assistance
    root_agent = Agent(
        name="career_advisor_agent_memory",
        model="gemini-2.5-pro", # Using a more capable model for orchestration
        description=(
            "Agent to help someone come up with a career plan"
        ),
        instruction=(
            """
            **Persona:** You are a helpful and knowledgeable career advisor.
    
            **Goal:** Your primary goal is to provide personalized career recommendations to users based on their skills, interests, and desired geographical location.
    
            **Workflow:**
    
            1.  **Information Gathering:** Your first step is to interact with the user to gather essential information. You must ask about:
                *   Their skills and areas of expertise.
                *   Their interests and passions.
                *   The city or country where they want to work.
    
            2.  **Tool Utilization:** Once you have identified a potential career and a specific geographical location from the user, you **must** use the `career_search_agent_memory` tool to find up-to-date information about job prospects.
    
            3.  **Synthesize and Respond:** After obtaining the information from the `career_search_agent_memory` tool, you will combine that with the user's stated skills and interests to provide a comprehensive and helpful career plan.
    
            **Important:** Do not try to answer questions about career options in a specific city or country from your own knowledge. Always use the `career_search_agent_memory` tool for such queries to ensure the information is current and accurate.
            """
        ),
        tools=[adk.tools.preload_memory_tool.PreloadMemoryTool(), agent_tool.AgentTool(career_search_agent_memory), ],
        after_agent_callback=auto_save_session_to_memory_callback,
    )
    

    That’s the root agent. Let’s unpack it. I’ve got some fairly detailed instructions to help it use the tool correctly and give a good response. Also note the tools. I’m preloading memory so that it gets context about existing memories, even if they happened five sessions ago. It’s got a tool reference to that “search” agent I defined above. And then after the agent generates a response, we save the key memories to the Memory Bank.

    runner = Runner(
        agent=root_agent,
        app_name=app_name,
        session_service=session_service,
        memory_service=memory_service)
    

    Finally, I’ve got a Runner. I’m not positive this is even used when the agent runs on Vertex AI Agent Engine, but it plays a part when running elsewhere.

    That’s it. 87 lines in one file. Writing the code wasn’t the hard part; knowing what to do and how to shape the agent was where all the work happened.

    Let’s deploy, and test it all out with cURL commands. To deploy this to the fully-managed Vertex AI Agent Engine, it’s a single ADK command now. You need to provide it a Cloud Storage bucket name (for storing artifacts), but that’s about it.

    adk deploy agent_engine \
        --project=seroter-project-base \
        --region=us-central1 \
        --staging_bucket=gs://seroter-agent-memory-staging \
        --display_name="Career Agent with Memory" \
        --trace_to_cloud \
        career_agent_memory/
    

    When this finishes, I saw a bucket loaded up with code and other artifacts.

    Files generated and stored by ADK for my deployed agent

    More importantly, I had an agent. Vertex AI Agent Engine has a bunch of pre-built observability dashboards, and an integrated view of sessions and memory.

    Vertex AI Agent Engine dashboard in the Google Cloud Console

    Let’s use this agent, and see if it does what it’s supposed to. I’m going to use cURL commands, so that it’s super clear as to what’s happening.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:query \
    -d '{"class_method": "create_session", "input": {"user_id": "u_123"},}'
    

    This first command creates a new session for our agent chat. The authorization comes from injecting a Google Cloud token into the header. I plugged in the “resource name” of the Agent Engine instance into the URI and set a user ID. I get back something like this:

    {
      "output": {
        "userId": "u_123",
        "id": "5926526278264946688",
        "events": [],
        "appName": "8479666769873600512",
        "state": {},
        "lastUpdateTime": 1760395538.0874159
      }
    }
    

    That “id” value matches the session ID now visible in the Vertex AI Session list. This session is for the given user, u_123.

    A session created for the agent running in the Vertex AI Agent Engine

    Now I can chat with my career agent. Here’s the cURL request for submitting a query. This will trigger my root agent, call my secondary agent, and store the key memories of the interaction as a callback.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:streamQuery?alt=sse \
    -d '{"class_method": "stream_query","input": {"user_id": "u_123","session_id": "5926526278264946688","message": "I am currently a beekeeper in New Mexico. I have been to college for economics, but that was a long time ago. I am thinking about moving to Los Angeles CA and get a technology job. What are my job prospects in that region and how should I start?",}}'
    

    Note that the engine ID is still in the URI, and payload contains the user ID and session ID. What I got back was a giant answer with some usable advice on how I can take my lucrative career as a beekeeper and make my mark on the technology sector.

    What got automatically saved as a memory? Switching to the Memories view in Vertex AI, I see that a few key details about my context were durably stored.

    Memories automatically parsed and stored in the Vertex AI Memory Bank

    Now if I delete my session, come back tomorrow and start a new one, any memories for this user ID (and agent engine instance) will be preloaded into every agent request. Very cool!

    Let’s quickly prove it. I can destroy my session with this cURL command.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/168267934565/locations/us-central1/reasoningEngines/8479666769873600512:query?alt=sse \
    -d '{"class_method": "delete_session","input": {"user_id": "u_123","session_id": "5926526278264946688",}}'
    

    No more session, but my Memories remain. I can then request another session (for the same user) using the earlier command:

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:query \
    -d '{"class_method": "create_session", "input": {"user_id": "u_123"},}'
    

    At this point, I could ask something like “what do you already know about me?” in my query to see if it retrieves the memories it stored before.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:streamQuery?alt=sse \
    -d '{"class_method": "stream_query","input": {"user_id": "u_123","session_id": "3132042709481553920","message": "What do you already know about me?",}}'
    

    Here’s what I got back:

    {"content": {"parts": [{"thought_signature": "CrgEAR_M...twKw==", "text": "You have an economics degree and are currently a beekeeper in New Mexico. You're considering a move to Los Angeles for a job in the technology sector."}], "role": "model"}, "finish_reason": "STOP", "usage_metadata": {"candidates_token_count": 32, "candidates_tokens_details": [{"modality": "TEXT", "token_count": 32}], "prompt_token_count": 530, "prompt_tokens_details": [{"modality": "TEXT", "token_count": 530}], "thoughts_token_count": 127, "total_token_count": 689, "traffic_type": "ON_DEMAND"}, "avg_logprobs": -0.8719542026519775, "invocation_id": "e-53e94a44-ad6b-4e97-9297-51612f4e77a9", "author": "career_advisor_agent_memory", "actions": {"state_delta": {}, "artifact_delta": {}, "requested_auth_configs": {}, "requested_tool_confirmations": {}}, "id": "c9e484cd-e5f7-4e1e-94d7-7490a006137d", "timestamp": 1760396342.830469}
    

    Excellent! With this approach, I have zero database management to do, yet my agents can retain context for each turn over an extended period of time.

    Vertex AI Agent Engine is cool, but what if you want to serve up your agents on a different runtime? Maybe a VM, Kubernetes, or the best app platform available, Google Cloud Run. We can still take advantage of managed sessions and memory, even if our workload runs elsewhere.

    The docs don’t explain how to do this, but I figured out the first step. You need that Agent Engine ID. When deploying to Vertex AI Agent Engine, it happened automatically. But now I need to explicitly submit an HTTP request to get back an ID to use for my agent. Here’s the request:

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://aiplatform.googleapis.com/v1/projects/168267934565/locations/us-central1/reasoningEngines \
    -d '{"displayName": "memory-bank-for-cloud-run"}'
    

    I get back an ID value, and I see a new entry show up for me in Vertex AI Agent Engine.

    Memory Bank instance for an agent in Cloud Run

    The ADK also supports Google Cloud Run as a deployment target, so I’ll deploy this exact agent, no code changes, there too. First, I threw a few values into the shell’s environment variables to use for the CLI command.

    export GOOGLE_CLOUD_PROJECT=seroter-project-base
    export GOOGLE_CLOUD_LOCATION=us-central1 
    export GOOGLE_GENAI_USE_VERTEXAI=True
    

    Then I issued the single request to deploy the agent to Cloud Run. Notice some different things here. First, no Cloud Storage bucket. Cloud Run creates a container from the source code and uses that. Also, I explicitly set the –memory_service_uri and –session_service_uri to enable some of the pre-wiring to those services. It didn’t work without it, and the current docs don’t include the proper parameters. And I also figured out (undocumented) how to add Cloud Run environment variables, since the Agent Engine ID was also needed there.

    adk deploy cloud_run \
    --project=$GOOGLE_CLOUD_PROJECT \
    --region=$GOOGLE_CLOUD_LOCATION \
    --service_name=career-agent \
    --app_name=career_agent \
    --port=8080 \
    --memory_service_uri=agentengine://8058017254761037824 \
    --session_service_uri=agentengine://8058017254761037824 \
    career_agent_memory/ \
    -- --set-env-vars "GOOGLE_CLOUD_AGENT_ENGINE_ID=8058017254761037824"
    

    In just a couple minutes, I ended up with an agent ready to serve on Cloud Run.

    Agent running in Cloud Run

    The URLs I use to interact with my agent are now different because we’re not calling the managed service endpoints of Vertex AI to invoke the agent. So if I want a new session to get going, I submit a cURL request like this:

    curl -X POST -H "Content-Type: application/json" -d '{}' \
        https://career-agent-168267934565.us-central1.run.app/apps/career_agent/users/u_456/sessions
    

    I’ve got no payload for this request, and specified the user name in the URL. I got back a session ID in a JSON payload like above. And I can see that session registered in my Agent Engine console.

    Session created based on web request

    Submitting queries to this agent is slightly different than when it was hosted in Vertex AI Agent Engine. For Cloud Run agents, the cURL request looks like this:

    curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        https://career-agent-168267934565.us-central1.run.app/run_sse \
        -H "Content-Type: application/json" \
        -d '{
        "app_name": "career_agent",
        "user_id": "u_456",
        "session_id": "311768995957047296",
        "new_message": {
            "role": "user",
            "parts": [{
            "text": "I am currently a cowboy in Las Vegas. I have been to college for political science, but that was a long time ago. I am thinking about moving to San Francisco CA and getting a technology job. What are my job prospects in that region and how should I start?"
            }]
        },
        "streaming": false
        }'
    

    After a moment, not only do I get a valid answer from my agent, but I also see that the callback fired and I’ve got durable memories in Vertex AI Memory Bank.

    Memories saved for the Cloud Run agent

    Just like before, I could end this session, start a new one, and the memories still apply . Very nice.

    Access to sessions and memories that scale as your agent does, or survive compute restarts, seems like a big deal. You can use your own database to store these, but I like having a fully managed option that handles every part of it for me. Once you figure out the correct code and configurations, it’s fairly easy to use. You can try this all yourself in Google Cloud with your existing account, or a new account with a bunch of free credits.

  • Vibing options for whoever you are

    Vibing options for whoever you are

    My favorite movie is Tommy Boy. I apologize for nothing. You won’t find this flick listed on the top 100 movies of all time, and some people think it’s a dumb movie. That’s ok. Look, I don’t really get the Dave Matthews Band. They seem like nice chaps, but it’s not my thing. But they have a massive following of superfans. Everything isn’t for everyone. Not every product is built for you.

    I recently looked at which AI coding tool was the right fit for a given situation. But what about Google’s hefty portfolio of products for those who want to vibe code and let AI take the wheel? In that case, it’s not just about the given situation, but also the type of person. Not every product is for each type of user.

    Vibe coding is expanding the pool of people who build apps. It seems to consist of consumers who are non-tech folks who want to bring ideas to life. There are tech-adjacent professionals who do “knowledge work” and might be business analysts, product managers, program leads, and executives. And then you have software developers who have deep understanding of tech, and want to quickly produce new software.

    What are we vibe-coding? To me, it seems like we’re building throwaway prototypes to just try something out quickly. We’re creating personal software that’s meant to improve our own productivity. And we’re delivering multi-user apps that are intended for legit use by others. So in my mind, it’s a grid like this, with my take on which Google tech fits where:

    Throw-away prototypesPersonal softwareMulti-user apps
    Consumersn/aGemini Canvas
    Gemini Gems
    Gemini Canvas
    Tech-adjacent professionalsGemini Canvas
    Google AI Studio
    Opal
    Google AI Studio
    Agentspace
    Gemini Canvas
    Google AI Studio
    Opal
    Firebase Studio
    Software developersGoogle AI Studio
    Gemini CLI
    Gemini CLI
    Gemini Code Assist
    Gemini CLI
    Gemini Code Assist
    Jules

    Vibing as consumers

    I’m not sure consumers are interested in throwaway prototypes. My non-tech friends wouldn’t want to geek out on tech. They have some sort of goal to solve a problem.

    What about consumers building “personal software” that acts as a web app, agent, or tool? Sure. Gemini Canvas seems like a good choice for this. And, for building simple apps to share with others. Gemini Gems are a tool for building personal AI assistants without needing to be a tech expert. Some are creating fun consumer-grade demos with Google AI Studio, so I wouldn’t complain if you added that product into this row as well.

    Let’s look at Gemini Canvas. You activate this in Gemini when you choose to “build.”

    Let me provide it a simple prompt:

    Beautiful exercise tracking app that lets me record the exercises I did in a given day, and for how long. I can also view past days and see trends over time.

    When I enter that prompt, Gemini gets to work. It creates a single-file app where the code is not the focus. You can see the code, but it quickly switches the UI to a preview of the app. I can make changes via the chat and get a live look at the changes.

    This is a fun to use, simple interface that’s consumer friendly. It’s easy to trigger, very fast at generating apps, has basic abilities to rollback changes, and offers sharing via a public link. It’s very opinionated on the tech stack, the code is all stuffed into a single artifact, and you don’t get many legit deployment options. Great for consumers who are building personal software or simple apps for a small group to use.

    Vibing as tech-adjacent professionals

    I think it’s awesome that anyone within a company can be a builder. It doesn’t matter if the HR person, executive assistant, program manager, or VP doesn’t know how to code. They can use Gemini Canvas as I showed above, along with other tools.

    Some who have some tech familiarity might jump to Google AI Studio. It’s free to use and fantastic for builders. From the “build” menu, you can trigger a vibe coding experience that keeps the focus on the outcome, not the code.

    I’ll use the same prompt as above, but you also get a few other configuration options, including the ability to choose between React or Angular.

    Once I submit the prompt, Google AI Studio gets to work thinking through a plan and building out the components. The interface is terrific here. What’s different from Gemini Canvas is that you get a professionally structured project with code arranged across files. It takes longer to get to a Preview because it’s doing more (responsible) work, but it’s still very fast.

    I like the suggestions offered for the app (above chat box), easy ability to download the app, GitHub integrations, and one-click deploys to Google Cloud Run. The code editor is basic, so I wouldn’t use this for sophisticated builds, but that’s not what it’s for.

    Opal is a new Google experiment for building “mini-AI apps” and isn’t a standard vibe coding tool. Think of it as a way to build apps that are focused on generating content with AI.

    My exercise tracking tool doesn’t make a ton of sense here. One use case for Opal could be to generate text for sharing content on each social media site.

    Opal lets you define what you need to collect from the user, assets you have available (YouTube videos, documents, and more), can perform web searches, generate all sorts of media, and aggregate results.

    It’s simple to preview and share these apps, and I like the concept. It’s not a traditional “vibe coding” tool, but I can see where non-developers would like using it to bring ideas to life.

    You could also consider Agentspace a vibing tool for office workers. Agentspace is a unique platform for those who want a more useful and effective internal experience for getting work done. Besides offering a AI-enabled search and research, it also has an agent-building experience for those who want personal agents.

    The agent builder interface is entirely no-code, and lets you tap into public internet searches, along with private enterprise data sources and tools. This is super helpful for those who want to automate repeatable tasks or build personal productivity solutions.

    The final option I put into this row is Firebase Studio. This is a service that’s completely applicable to software developers, but also friendly to those who aren’t professionals in this space. Unlike most of the options I’ve listed so far, this isn’t only for front-end solutions. I can build backend Go or Java apps too. It also offers a vibing UI where you can start with a prompt and build the app. I’ll use the same prompt I did earlier.

    When you start vibe coding here, Firebase Studio shares an app blueprint and then gets to work. I wouldn’t give this to a consumer persona—there’s still technical know-how you’d need to have—built it’s approachable to those who aren’t full-on software developers.

    Vibing as software developers

    If you’re a developer, you can use any of the options above. You might love the simplicity of Gemini Canvas, or prefer the opinionated Google AI Studio environment. Sounds good, live your life.

    Many software people want to vibe code with tools already in their toolchain. Your best bets with Google are the Gemini CLI and Gemini Code Assist.

    The Gemini CLI is a command line interface that’s free to use. You can authenticate with your Google ID (as an individual or corporate user), bring a Google AI Studio API key, or use Google Cloud Vertex AI. It has built-in tools (Google Search, shell, reading files, etc), supports MCP, has configurable memory, and can run anywhere. We recently added a supporting GitHub Action so that you can use it in your code repo. And Zed just integrated it into their next-gen code editor.

    The Gemini CLI is a great vibe coding tool. I could use the same prompt above, but also ask for a technical spec first, define my language/framework preferences, and generally steer the build the way I want.

    When I want maximum power during my vibe coding sessions (like I had yesterday), i use a combination of Gemini Code Assist with the Gemini CLI rolled in. Killer combo, as I get the conversational AI mode of Gemini Code Assist in my IDE/editor, but also the rich agentic power of the Gemini CLI with shared context. Yesterday I vibe coded a complete “travel app” after using the CLI to generate a spec and then incrementally implementing it with the CLI, and taking over control in the editor when I needed to.

    Wrap up

    There’s no wrong answer here. Use what fits your situation, and the role you’re playing. If you’re goofing around and just want a frontend app, anyone should use things like Google AI Studio. If you only want a code-centric experience for every situation, stay with IDE-style tools and CLIs. But I love that it’s so much simpler for people of any skill level to realize their ideas through software thanks to a range of vibe coding tools that suit each person’s taste.

  • Code was the least interesting part of my multi-agent app, and here’s what that means to me

    Code was the least interesting part of my multi-agent app, and here’s what that means to me

    At least 80% of the code I’ve ever written could have been written by AI, probably at higher quality. I’ve been “in tech” for twenty seven years and spent seven of those as a software developer. Even when I stopped getting paid for it, I never stopped coding. But little of it’s been truly novel; most of my code has been straightforward database access code, web APIs, presentation logic, and a handful of reasonably-complex systems. No doubt, many of you have done truly sophisticated things in code—compilers, performance-tuned algorithms, language frameworks—and AI isn’t replacing that any time soon. But I’d bet that much of the interesting tech work is moving away from raw code, and towards higher-order architecture.

    I wanted to build out an agentic solution, and I used AI to generate 90% of the code. That code isn’t where the unique value was at. None of it was particularly noteworthy. You can find the whole app here. The most interesting work related to architectural decisions. Here are eight choices I had to make, and I suspect you’ll have fun wrestling with the same ones.

    Choice #1 – What am I trying to accomplish and do agents make sense?

    My goal was to build an app that could take in a customer’s roofing needs, create a service appointment, and generate a personalized invoice for the work. I’m cheating here, since this exercise started as “Richard wants to learn some agent tech.” So I did start with the end in mind. Judge me accordingly.

    But in every legit situation, we start by evaluating the user need. What functional requirements do I need to satisfy? What performance or quality attributes are necessary? Can I solve this with a simple service, or modular monolith? Is the user flow deterministic or variable?

    This scenario could certainly be solved by a simple data collection form and PDF generator. What requirements might make an agentic architecture the truly correct choice?

    • Data collection from the user requires image, video, and audio input to best scope the services and pricing we should offer.
    • The scheduling or invoicing process requires a dynamic workflow based on a variety of factors, and hard-coding all the conditions would be tricky.

    Either way, this is always a critical choice before you write a single line of code.

    Choice #2 – What data or services are available to work with?

    Before we build anything new, what do we already have at our disposal?

    In my case, let’s assume I already have an appointments web API for retrieving available appointment times and making new appointments. I’ve also got an existing database that stores promotional offers that I want to conditionally add to my customer invoice. And I’ve got an existing Cloud Storage bucket where I store customer invoice PDFs.

    It’s easy to just jump into the application build, but pause for a few moments and take stock of your existing inventory and what you can build around.

    Choice #3 – What (agent) framework should I use and why?

    So. Many. Choices.

    There’s AI app frameworks like Genkit, LlamaIndex, and Spring AI. There are agent frameworks like LangChain, LangGraph, Autogen, CrewAI, and more. Google recently shipped the Agent Development Kit, available for Python and Java developers. An agent built with something like ADK is basically made up of three things: a model, instructions, and tools. ADK adds sweeteners that give you a lot of flexibility. Things I like about ADK:

    And look, I like it because my employer invests in it. So, that’s a big factor. I also wanted to build agents in both Python and Java, and this made ADK a great choice.

    Don’t get married to any framework, but learn the fundamentals of tool use, memory management, and agent patterns.

    Choice #4 – How should I use tools in the appointment agent?

    I suspect that tool selection will be a fascinating area for many builders in the years ahead. In this scenario, I had some decisions to make.

    I don’t want to book any roof repairs on rainy days. But where can I get the weather forecast from? I chose the built-in Google Search tool instead of trying to find some weather API on the internet.

    weather_agent = Agent(
        name="weather_agent",
        model="gemini-2.0-flash",
        description=(
            "Agent answers questions about the current and future weather in any city"
        ),
        instruction=(
            "You are an agent for Seroter Roofing. You can answer user questions about the weather in their city right now or in the near future"
    ),
        tools=[google_search],
    )
    

    For interacting with my existing appointments API, what’s the right tool choice? Using the OpenAPI tool baked into the ADK, I can just hand the agent an OpenAPI spec and it’ll figure out the right functions to call. For retrieving open appointment times, that’s a straightforward choice.

    openapi_spec = openapi_spec_template.replace("{API_BASE_URL}", config.API_BASE_URL)
    
    toolset = OpenAPIToolset(spec_str=openapi_spec, spec_str_type="json")
    api_tool_get_appointments = toolset.get_tool("get_available_appointments")
    

    But what about booking appointments? While that’s also an API operation, I want to piggyback a successful booking with a message to Google Cloud Pub/Sub that downstream subscribers can read from. That’s not part of the appointments API (nor should it be). Instead, I think a function tool makes sense here, where I manually invoke the appointments API, and then make as subsequent call to Pub/Sub.

    def add_appointment(customer: str, slotid: str, address: str, services: List[str], tool_context: ToolContext) -> dict:
        """Adds a roofing appointment by calling the booking API and logs the conversation history.
    
        This function serves as a tool for the agent. It orchestrates the booking process by:
        1. Calling the internal `_book_appointment_api_call` function to make the actual API request.
        2. If the booking is successful, it retrieves the conversation history from the
           `tool_context` and logs it to a Pub/Sub topic via `_log_history_to_pubsub`.
    
        Args:
            customer: The name of the customer.
            slotid: The ID of the appointment slot to book.
            address: The full address for the appointment.
            services: A list of services to be booked for the appointment.
            tool_context: The context provided by the ADK, containing session information.
    
        Returns:
            A dictionary containing the booking confirmation details from the API,
            or an error dictionary if the booking failed.
        """
        booking_response = _book_appointment_api_call(customer, slotid, address, services)
    
        if "error" not in booking_response:
            history_list: List[Event] = tool_context._invocation_context.session.events # type: ignore
            _log_history_to_pubsub(history_list)
        
        return booking_response
    

    Choice #5 – When/how do I separate agent boundaries?

    There’s a good chance that an agentic app has more than one agent. Stuffing everything into a single agent with a complex prompt and a dozen tools seems … suboptimal.

    But multi-agent doesn’t have to mean you’re sliding into a distributed system. You can include multiple agents in the same process space and deployment artifact. The Sequential Agent pattern in the ADK makes it simple to define distinct agents that run one and at time. So it seems wise to think of service boundaries for your agents, and only make a hard split when the context changes.

    For me, that meant one set of agents handling all the appointment stuff, and another distinct set of agents that worked on invoices. These don’t depend on each other, and should run separately. Both sets of agents use the Sequential Agent pattern.

    The appointment agent has sub-agents to look up the weather, and uses that agent as a tool within the primary root agent.

    The invoicing agent is more complex with sub-agents to build up HTML out of the chat history, another agent that looks up the best promotional offers to attach to the invoice, and a final agent that generates a PDF.

    private SequentialAgent createInvoiceAgent(
                PdfTool pdfTool,
                String mcpServerUrl,
                Resource htmlGeneratorPrompt,
                Resource bestOfferPrompt,
                Resource pdfWriterPrompt
        ) {
            String modelName = properties.getAgent().getModelName();
    
            LlmAgent htmlGeneratorAgent = LlmAgent.builder().model(modelName).name("htmlGeneratorAgent").description("Generates an HTML invoice from conversation data.").instruction(resourceToString(htmlGeneratorPrompt)).outputKey("invoicehtml").build();
    
            List<BaseTool> mcpTools = loadMcpTools(mcpServerUrl);
    
            LlmAgent bestOfferAgent = LlmAgent.builder().model(modelName).name("bestOfferAgent").description("Applies the best offers available to the invoice").instruction(resourceToString(bestOfferPrompt)).tools(mcpTools).outputKey("bestinvoicehtml").build();
    
            FunctionTool generatePdfTool = FunctionTool.create(PdfTool.class, "generatePdfFromHtml");
    
            LlmAgent pdfWriterAgent = LlmAgent.builder().model(modelName).name("pdfWriterAgent").description("Creates a PDF from HTML and saves it to cloud storage.").instruction(resourceToString(pdfWriterPrompt)).tools(List.of(generatePdfTool)).build();
    
            return SequentialAgent.builder().name(properties.getAgent().getAppName()).description("Execute the complete sequence to generate, improve, and publish an PDF invoice to Google Cloud Storage.").subAgents(htmlGeneratorAgent, bestOfferAgent, pdfWriterAgent).build();
        }
    

    How should I connect these agents? I didn’t want hard-coded links between the services, as they can operate async and independently. You could imagine other services being interested in a booking too. So I put Google Cloud Pub/Sub in the middle. I used a push notification (to the invoice agent’s HTTP endpoint), but I’ll probably refactor it and make it a pull subscription that listens for work.

    Choice #6 – What’s needed in my agent instructions?

    I’m getting better at this. Still not great. But I’m using AI to help me, and learning more about what constraints and direction make the biggest impact.

    For the booking agent, my goal was to collect all the data needed, while factoring in constraints such as weather. My agent instructions here included core principles, operational steps, the must-have data to collect, which decisions to make, and how to use the available tools.

    root_agent = Agent(
        name="root_agent",
        model="gemini-2.5-flash",
        description="This is the starting agent for Seroter Roofing and customers who want to book a roofing appointment",
        instruction=(
            """
    You are an AI agent specialized in booking roofing appointments. Your primary goal is to find available appointments for roofing services, and preferably on days where the weather forecast predicts dry weather.
    
    ## Core Principles:
    
        *   **Information First:** You must gather the necessary information from the user *before* attempting to use any tools.
        *   **Logical Flow:** Follow the steps outlined below strictly.
        *   **Professional & Helpful:** Maintain a polite, professional, and helpful tone throughout the interaction.
    
    ## Operational Steps:
    
    1.  **Greeting:**
        *   Start by politely greeting the user and stating your purpose (booking roofing appointments).
        *   *Example:* "Hello! I can help you book a roofing appointment. What kind of service are you looking for today?"
    
    2.  **Information Gathering:**
        *   You need two key pieces of information from the user:
            *   **Type of Service:** What kind of roofing service is needed? (e.g., repair, replacement, inspection, estimate)
            *   **Service Location:** What city is the service required in?
        *   Ask for this information clearly if the user doesn't provide it upfront. You *cannot* proceed to tool usage until you have both the service type and the city.
        *   *Example follow-up:* "Great, and in which city is the property located?"
    
    3.  **Tool Usage - Step 1: Check Appointment Availability (Filtered):**
        *   Get information about available appointment times:
        *   **[Use Tool: Appointment availability]** for the specified city.
        *   **Crucially:** When processing the results from the appointment tool, **filter** the available appointments to show *only* those that fall on the specific dates without rain in the forecast. You should also consider the service type if the booking tool supports filtering by type.
    
    4.  **Tool Usage - Step 2: Check Weather Forecast:**
        *   Once you have the service type and city, your next action is to check the weather.
        *   **[Use Tool: 7-day weather forecast]** for the specified city.
        *   Analyze the forecast data returned by the tool. Identify which days within the next 7 days are predicted to be 'sunny' or at least dry. Be specific about what constitutes 'dry' based on the tool's output.
    
    5.  **Decision Point 1: Are there Appointments on Dry Days?**
        *   If the appointment availability tool returns available slots *specifically* on the identified dry days:
            *   Present these available options clearly to the user, including the date, time, and potentially the service type (if applicable).
            *   Explain that these options meet the dry weather preference.
            *   Prompt the user to choose an option to book.
            *   *Example:* "Great news! The forecast for [City] shows dry weather on [Date 1], [Date 2], etc. I've checked our schedule and found these available appointments on those days: [List appointments]."
    
        *   If the appointment availability tool returns slots, but *none* of them fall on the identified sunny days (or if the tool returns no slots at all):
            *   Inform the user that while there are dry days coming up, there are currently no appointments available on those specific dry dates within the next 7 days.
            *   Explain that your search was limited to the dry days based on the forecast.
            *   Suggest they might want to try a different service type (if relevant) or check back later as availability changes.
            *   *Example:* "While the forecast for [City] does show some dry days coming up, I wasn't able to find any available appointments specifically on those dates within the next week. Our schedule on sunny days is quite popular. Please try again in a few days, as availability changes, or let me know if you need a different type of service."
    
    6.  **Confirmation/Booking (If Applicable):**
        *   Be sure to get the full name and full address of the location for the appointment.
             
    **Tools**
        You have access to the following tools to assist you:
        `weather_agent`: use this tool to find the upcoming weather forecast and identify rainy days
        `api_tool_get_appointments -> json`: use this OpenAPI tool to answer any questions about available appointments
        `add_appointment(customer: str, slotid: str, address: str, services: List[str]) -> dict`: use this tool to add a new appointment
    """
        ),
        tools=[agent_tool.AgentTool(weather_agent), api_tool_get_appointments, tools.add_appointment],
    )
    

    The invoicing agent had a more complex prompt as I wanted to shape the blob of chat history into a structured JSON and then into valid HTML. Of course, I could have (should have?) structured the raw data before it left the original agent, but I wanted try it this way. My agent instructions show an example of the preferred JSON, and also the valid HTML structure.

    **Role:** You are a specialized agent designed to generate an HTML invoice from a successful appointment booking history.
    
    **Task:** Process the entire user prompt, which contains conversation history in a JSON format. Your goal is to create a complete HTML invoice based on the details found in that JSON.
    
    [...]
    
    4.  **Invoice JSON Structure:** The JSON invoice you internally generate **must** strictly adhere to the format provided in the example below. Do not add extra fields or change field names. Ensure numbers are formatted correctly (e.g., 100.00, 0.00).
        ```json
        {
        "invoiceNumber": "INV-BOOKING-[Current Date YYYYMMDD]", // Generate based on date
        "issueDate": [YYYY, M, D], // Current Date
        "dueDate": [YYYY, M, D], // Current Date + 30 days
        "customerName": "[Extracted Customer Name]",
        "customerAddress": "[Extracted Customer Address]",
        "items": [
            {
            "description": "[Description of Booked Service]",
            "quantity": 1,
            "unitPrice": [Price of Service],
            "lineTotal": [Price of Service]
            }
        ],
        "subtotal": [Price of Service],
        "taxAmount": 0.00,
        "summary": "Invoice for booked [Service Name]",
        "totalAmount": [Price of Service]
        }
        ```
    
    [...]
    
    7.  ** Create an HTML string based on the example structure here **
    ```html
    <!DOCTYPE html>
    <html>
    <head>
    	<meta charset="UTF-8" />
    	<title>Seroter Roofing Invoice</title>
    	<style type="text/css">
    		body { font-family: sans-serif; margin: 20px; }
    		h1 { color: navy; }
    		.header, .customer-info, .summary-block, .footer { margin-bottom: 20px; }
    		.invoice-details { margin-top: 20px; padding: 10px; border: 1px solid #ccc; }
    		.invoice-details p { margin: 5px 0; }
    		table { width: 100%; border-collapse: collapse; margin-top: 20px; }
    		.summary-block { padding: 10px; border: 1px dashed #eee; background-color: #f9f9f9; }
    		th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
    		th { background-color: #f2f2f2; }
    		.text-right { text-align: right; }
    	</style>
    </head>
    <body>
    	<h1>Invoice</h1>
    
    	<div class="header">
    		<p><strong>Invoice Number:</strong>INV-001</p>
    		<p><strong>Date Issued:</strong>January 01, 2024</p>
    		<p><strong>Date Due:</strong>January 15, 2024</p>
    	</div>
    
    	<div class="customer-info">
    		<h2>Bill To:</h2>
    		<p>Customer Name</p>
    		<p>123 Customer Street, Denver, CO 80012</p>
    	</div>
    
    	<div class="summary-block">
    		<h2>Summary</h2>
    		<p>Details about the appointment and order...</p>
    	</div>
    
    	<table>
    		<thead>
    			<tr>
    				<th>Description</th>
    				<th>Quantity</th>
    				<th>Unit Price</th>
    				<th>Line Total</th>
    			</tr>
    		</thead>
    		<tbody>
    			<tr >
    				<td>Sample Item</td>
    				<td class="text-right">1</td>
    				<td class="text-right">10.00</td>
    				<td class="text-right">10.00</td>
    			</tr>
    		</tbody>
    	</table>
    
    	<div class="invoice-details">
    		<p class="text-right"><strong>Subtotal:</strong>>0.00</p>
    		<p class="text-right"><strong>Tax:</strong>0.00</p>
    		<p class="text-right"><strong>Total Amount:</strong> <strong>$123.45</strong></p>
    	</div>
    	<div class="footer">
    		<p>Thank you for your business!</p>
    	</div>
    </body>
    </html>
    ```
    

    Doing this “context engineering” well is important. Think through the instructions, data, and tools that you’re giving an agent to work with.

    Choice #7 – What’s the right approach to accessing Cloud services?

    My agent solution sent data to Pub/Sub (addressed above), but also relied on data sitting in a PostgreSQL database. And PDF blobs sitting in Cloud Storage.

    I had at least three implementation options here for PostgreSQL and Cloud Storage:

    • Function calling. Use functions that call the Cloud APIs directly, and leverage those functions as tools.
    • Model Context Protocol (MCP). Use MCP servers that act as API proxies for the LLM to use
    • YOLO mode. Ask the LLM to figure out the right API call to make for the given service.

    The last option works (mostly), but would be an absurd choice to make in 99.98% of situations.

    The appointment agent calls the Pub/Sub API directly by using that encompassing function as a tool. For the database access, I chose MCP. The MCP Toolbox for Databases is open source and fairly simple to use. It saves me from a lot of boilerplate database access code.

    private List<BaseTool> loadMcpTools(String mcpServerUrl) {
            try {
                SseServerParameters params = SseServerParameters.builder().url(mcpServerUrl).build();
                logger.info("Initializing MCP toolset with params: {}", params);
                McpToolset.McpToolsAndToolsetResult result = McpToolset.fromServer(params, new ObjectMapper()).get();
                if (result.getTools() != null && !result.getTools().isEmpty()) {
                    logger.info("MCP tools loaded: {}", result.getTools().size());
                    return result.getTools().stream().map(mcpTool -> (BaseTool) mcpTool).collect(Collectors.toList());
                }
            } catch (Exception e) {
                logger.error("Error initializing MCP toolset", e);
            }
            return new ArrayList<>();
        }
    

    When creating the PDF and adding it to Cloud Storage, I decided to use a robust function that I passed to the agent as a tool.

    private Map<String, Object> generatePdfFromHtmlInternal(String htmlContent) throws IOException {
            if (htmlContent == null || htmlContent.trim().isEmpty()) {
                throw new IllegalArgumentException("HTML content cannot be null or empty.");
            }
    
            try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
                ITextRenderer renderer = new ITextRenderer();
                renderer.setDocumentFromString(htmlContent);
                renderer.layout();
                renderer.createPDF(baos);
    
                String timestamp = LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMddHHmmssSSS"));
                String uniquePdfFilename = OUTPUT_PDF_FILENAME.replace(".pdf", "_" + timestamp + ".pdf");
                String bucketName = properties.getGcs().getBucketName();
    
                BlobId blobId = BlobId.of(bucketName, uniquePdfFilename);
                BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("application/pdf").build();
    
                storage.create(blobInfo, baos.toByteArray());
    
                String gcsPath = "gs://" + bucketName + "/" + uniquePdfFilename;
                logger.info("Successfully generated PDF and uploaded to GCS: {}", gcsPath);
                return Map.of("status", "success", "file_path", gcsPath);
    
            } catch (DocumentException e) {
                logger.error("Error during PDF document generation", e);
                throw new IOException("Error during PDF document generation: " + e.getMessage(), e);
            } catch (Exception e) {
                logger.error("Error during PDF generation or GCS upload", e);
                throw new IOException("Error during PDF generation or GCS upload: " + e.getMessage(), e);
            }
        }
    

    Choice #8 – How do I package up and run the agents?

    This choice may depend on who the agent is for (internal or external audiences), who has to support the agent, and how often you expect to update the agent.

    I chose to containerize the components so that I had maximum flexibility. I could have easily used the ADK CLI to deploy directly to Vertex AI Agent Engine—which comes with convenient features like memory management—but wanted more control than that. So I have Dockerfiles for each agent, and deploy them to Google Cloud Run. Here I get easy scale, tons of optional configurations, and I don’t pay for anything when the agent is dormant.

    In this case, I’m just treating the agent like any other type of code. You might make a different choice based on your use case.

    The final solution in action

    Let’s run this thing through. All the source code is sitting in my GitHub repo.

    I start by opening the the appointment agent hosted in Cloud Run. I’m using the built-in ADK web UI to have a conversational chat with the initial agent. I mention that I might have a leaky roof and want an inspection or repair. The agent then follows its instructions. After checking the weather in the city I’m in, it retrieves appointments via the API. On the left, there’s a handy set of tools to trace events, do evals, and more.

    At this point, I chose an available appointment, and the agent followed it’s next set of instructions. The appointment required two pieces of info (my name, and address), and wouldn’t proceed until I provided it. Once it had the data, it called the right function to make an appointment and publish a message to Pub/Sub.

    That data flowed through Google Cloud Pub/Sub, and got pushed to another agent hosted in Cloud Run.

    That agent immediately loaded up its MCP tools by calling the MCP server also hosted in Cloud Run. That server retrieved the list of offers for the city in question.

    This agent runs unattended in the background, so there’s no chat interface or interactivity. Instead, I can track progress by reading the log stream.

    When this agent got done converting the chat blob to JSON, then creating an HTML template, and calling the MCP tools to attach offers, it wrote the final PDF to Cloud Storage.

    There you go. It’s not perfect and I have improvements I want to make. Heck, the example here has the wrong date in the invoice, which didn’t happen before. So I need better instructions there. I’d like to switch the second agent from a push to a pull. It’d be fun to add some video or audio intake to the initial agent.

    Nobody knows the future, but it looks we’ll be building more agents, and fewer standalone apps. APIs matter more than ever, as do architectural decisions. Make good ones!

  • Quality-focused prompts for the vibe coding addict

    You’ve vibe coded an app? Congrats. That’s cool. Maybe you’re completely new to programming, or you’ve been doing it awhile. It’s possible you’ve seen some of the horror stories online of folks vibe coding something, deploying it, and then watching it collapse as folks flooded and hacked it. Yikes. It’s important to pay attention to quality attributes. Sometimes called non-functional requirements, these are key aspects of a well-built software system. They include traits like usability, maintainability, security, reliability, scalability, availability, and flexibility. While you might expect (hope?) that your vibe coding service created software that always adheres to these concepts, you shouldn’t assume so. And either way, these are important characteristics to understand if you want to mature as a software creator.

    You can get AI to help here, though. I used Google AI Studio and Gemini 2.5 Pro to generate some prompts you could apply after you’ve built an app using Bolt, Vercel’s v0, Lovable, or Gemini Canvas. Maybe you plug those generated prompts into the Gemini CLI after you’ve downloaded the code from your vibe coding session. Here’s my initial prompt, which you could take and adjust however you want:

    I’ve vibe coded an app, and now want to make sure it’s of high enough quality to ship to production. I’m not a professional developer. Suggest 3-5 comprehensive and understandable LLM prompts I should now offer my agentic CLI to check for software quality attributes like usability, reliability, performance, observability, security, repeatability, scalability, availability, flexibility, and maintainability.

    Below is the direct output from Gemini, including the proposed prompts that hardens your software while making it more fault tolerant and scalable.


    Here are 4 comprehensive prompts designed to be fed to your agentic CLI. They are structured to cover all the quality attributes you mentioned in a logical, step-by-step manner.

    The Strategy: From Code to Cloud

    We’ll use a layered approach. Think of it like building a house:

    1. The Blueprint & Foundation: Is the code itself clean, organized, and easy to understand? (Maintainability, Flexibility)
    2. The Walls & Locks: Is the application secure and can it handle errors gracefully? (Security, Reliability)
    3. The Engine & Plumbing: Can the app perform under load and tell you when it’s sick? (Performance, Scalability, Observability)
    4. The Assembly Line: Can you build and deploy it consistently every single time? (Repeatability, Availability, Scalability)

    Prompt 1: The Code Quality & Maintainability Audit

    Goal: This prompt focuses on cleaning up the existing “vibe code.” It makes your app easier for you (and your AI agent) to understand, modify, and extend in the future.

    Attributes Covered: Maintainability, Flexibility, Repeatability (at the code level).

    Act as a senior software engineer and mentor conducting a thorough code review of my entire project. Your goal is to improve its long-term maintainability, flexibility, and readability. Analyze the whole codebase and provide the following:

    1. **Refactoring for Clarity:** Identify the top 5 functions or classes that are too long, complex, or have unclear names ("code smells"). For each, provide a refactored version that is simpler, follows the Single Responsibility Principle, and has clearer naming.
    2. **Configuration & Secrets:** Scan the code for any hardcoded configuration values (like API endpoints, database strings, or magic numbers). Suggest extracting them into a centralized configuration file (e.g., `config.js`, `.env`, or `settings.py`) and provide the template for this file. Flag any plain-text secrets immediately.
    3. **Dependency Review:** List all external libraries and dependencies. Point out any that are deprecated, have known major issues, or could be replaced by a more standard/modern alternative.
    4. **Automated Quality Gates:** Generate a configuration file for a standard linter and code formatter for my project's language (e.g., `.eslintrc.json` and `.prettierrc` for JavaScript/TypeScript, or `pyproject.toml` for Python with Black and Ruff). This ensures future code stays clean.
    5. **Documentation:** Generate a template for a `README.md` file that includes a project description, setup instructions for a new developer, and an explanation of the core project structure.

    Prompt 2: The Security & Reliability Hardening

    Goal: This prompt switches focus to making your app robust against attacks and resilient to failure. It assumes the code is now cleaner from Prompt 1.

    Attributes Covered: Security, Reliability, Availability.

    Act as a paranoid but helpful security and reliability engineer. Your mission is to identify and help me fix potential security vulnerabilities and sources of unreliability in my application. Analyze the entire codebase and provide a report with actionable code suggestions for the following:

    1. **Security Vulnerability Scan (OWASP Top 10):**
    * **Input Validation:** Find all points where the application accepts user input (API endpoints, forms, etc.). Check for potential injection vulnerabilities (SQL, NoSQL, Command).
    * **Cross-Site Scripting (XSS):** Check if output to the user is properly sanitized or escaped.
    * **Authentication/Authorization:** Review how users are authenticated and how their permissions are checked. Look for common flaws.
    * **Insecure Dependencies:** Scan my `package.json`, `requirements.txt`, etc., for dependencies with known security vulnerabilities (CVEs) and suggest updated, secure versions.

    2. **Error Handling & Reliability:**
    * Identify all critical code paths (e.g., database calls, external API requests, file I/O).
    * Pinpoint areas lacking proper error handling (e.g., missing `try...catch` blocks or unchecked errors).
    * For each area, suggest adding robust error handling that prevents the app from crashing and provides a clear error message or fallback.

    3. **Availability Checkpoint:**
    * Suggest creating a simple health check endpoint (e.g., `/healthz` or `/status`). This endpoint should return a `200 OK` status if the app is running and can connect to its essential services (like the database). Provide the code for this endpoint.

    Prompt 3: The Performance, Scalability & Observability Tune-Up

    Goal: Now that the app is clean and secure, let’s make it fast and ensure you can see what’s happening inside it when it’s running.

    Attributes Covered: Performance, Scalability, Observability.

    Act as a Site Reliability Engineer (SRE) focused on performance and observability. Your goal is to ensure my application can handle growth and that I can diagnose problems in production. Analyze the codebase and suggest improvements in these areas:

    1. **Performance Bottlenecks:**
    * **Database Queries:** Identify any database queries performed inside loops (N+1 query problem). Suggest how to optimize them into a single, more efficient query.
    * **Heavy Computations:** Find any computationally expensive operations or inefficient algorithms that could block the main thread or slow down responses. Suggest optimizations or asynchronous execution.
    * **Data Handling:** Look for places where the app loads very large amounts of data into memory at once. Suggest using pagination, streaming, or chunking.

    2. **Observability - Logging & Metrics:**
    * **Structured Logging:** Review my current logging (or lack thereof). Propose a structured logging strategy (e.g., JSON format). Refactor 3-5 key `console.log` or `print` statements to use this new structured logger, including important context like user ID or request ID.
    * **Key Metrics:** Identify the 3 most important metrics for my application's health (e.g., API request latency, error rate, number of active users). Show me where and how to instrument the code to capture these metrics, even if it's just with a logging statement for now.

    3. **Scalability Review:**
    * Identify anything that would prevent me from running multiple instances of this application (horizontal scaling). This usually involves checking for in-memory state that should be moved to a shared store like a database or Redis (e.g., session stores, caches, locks).

    Bonus Prompt 4: The Production-Ready Deployment Blueprint

    Goal: This prompt bridges the gap between your laptop and a real server. It makes your deployment process automatic, reliable, and repeatable.

    Attributes Covered: Repeatability, Availability, Scalability.

    Act as a DevOps specialist. My application is now clean, secure, and performant, but I need a professional and repeatable way to deploy it. Based on my project's language and framework, generate the necessary files and configurations to containerize it and set up a basic deployment pipeline.

    1. **Containerization:** Create a multi-stage `Dockerfile`. The first stage should build the application (installing dev dependencies and compiling assets), and the final stage should be a lean, production-optimized image containing only the necessary runtime files and dependencies. Add comments explaining each step.
    2. **Local Development Environment:** Create a `docker-compose.yml` file. This file should define my application's service and any other services it depends on (like a database or a cache). This will allow any developer to spin up the entire stack with a single command (`docker-compose up`).
    3. **Configuration Management:** Create a `.env.example` file that lists all the environment variables the application needs to run (from Prompt 1), with placeholder values. This file serves as a template for production, staging, and development environments.
    4. **Continuous Integration (CI):** Generate a basic CI pipeline configuration file for GitHub Actions (`.github/workflows/ci.yml`). This pipeline should trigger on every push to the main branch and automatically:
    * Install dependencies.
    * Run the linter and code formatter checks (from Prompt 1).
    * (If you have them) Run any automated tests.
    * Build the Docker image to ensure it works.

    By running your app through these four prompts, you’ll have systematically addressed the most critical software quality attributes, transforming your “vibe-coded” project into something you can be confident shipping to production.


    Is this fun to do? I dunno. Maybe not as much fun as watching an agentic service crafting software right before your eyes. But it’s the right thing to do.

  • Here’s what AI-native engineers are doing differently than you

    The “what” and the “how” in software engineering occasionally change at the same time. Often, one triggers the other. The introduction of mainframes ushered in batch practices that capitalized on the scarcity of computing power. As the Internet took off, developers needed to quickly update their apps and Agile took hold. Mobile computing and cloud computing happened, and DevOps emerged shortly thereafter. Our current moment seems different as the new “what” and “how” are happening simultaneously, but independently. The “what” that’s hot right now is AI-driven apps. Today’s fast-developing “how” is AI-native software engineering. I’m seeing all sorts of teams adopt AI to change how they work. What are they doing that you’re not?

    AI natives always start (or end) with AI. The team at Pulley says “the typical workflow involves giving the task to an AI model first (via Cursor or a CLI program) to see how it performs, with the understanding that plenty of tasks are still hit or miss.” Studying a domain or competitor? Start with Gemini Deep Research or another AI research service. Find yourself stuck in an endless debate over some aspect of design? While you argued, the AI natives built three prototypes with AI to prove out the idea. Googlers are using it to build slides, debug production incidents, and much more. You might say “but I used an LLM before and it hallucinated while generating code with errors in it.” Stop it, so do you. Update your toolchain! Anybody seriously coding with AI today is using agents. Hallucinations are mostly a solved problem with proper context engineering and agentic loops. This doesn’t mean we become intellectually lazy. Learn to code, be an expert, and stay in charge. But it’s about regularly bringing AI in at the right time to make an impact.

    AI natives switched to spec-driven development. It’s not about code-first. Heck, we’re practically hiding the code! Modern software engineers are creating (or asking AI) for implementation plans first. My GM at Google Keith Ballinger says he starts projects by “ask[ing] the tool to create a technical design (and save to a file like arch.md) and an implementation plan (saved to tasks.md).” Former Googler Brian Grant wrote a piece where he explained creating 8000 character instructions that steered the agent towards the goal. Those folks at Pulley say that they find themselves “thinking less about writing code and more about writing specifications – translating the ideas in my head into clear, repeatable instructions for the AI.” These design specs have massive follow-on value. Maybe it’s used to generate the requirements doc. Or the first round of product documentation. It might produce the deployment manifest, marketing message, and training deck for the sales field. Today’s best engineers are great at documenting intent that in-turn, spawns the technical solution.

    AI natives have different engineer and team responsibilities. With AI agents, you orchestrate. You remain responsible for every commit into main, but focus more on defining and “assigning” the work to get there. Legitimate work is directed to background agents like Jules. Or give the Gemini CLI the task of chewing through an analysis or starting a code migration project. Either way, build lots of the right tools and empower your agents with them. Every engineer is a manager now. And the engineer needs to intentionally shape the codebase so that it’s easier for the AI to work with. That means rule files (e.g. GEMINI.md), good READMEs, and such. This puts the engineer into the role of supervisor, mentor, and validator. AI-first teams are smaller, able to accomplish more, capable of compressing steps of the SDLC and delivering better quality, faster. AI-native teams have “almost eliminated engineering effort as the current bottleneck to shopping product.”

    There are many implications for all this. Quality is still paramount. Don’t create slop. but to achieve the throughput, breadth, and quality your customers demand requires a leap forward in your approach. AI is overhyped and under-hyped at the same time, and it’s foolish to see AI as the solution to everything. But it’s a objectively valuable to a new approach. Many teams have already made the shift and have learned to continuously evaluate and incorporate new AI-first approaches. It’s awesome! If you’re ignoring AI entirely, you’re not some heroic code artisan; you’re just being unnecessarily stubborn and falling behind. Get uncomfortable, reassess how you work, and follow the lead of some AI-native pioneers blazing the trail.

  • The Gemini CLI might change how I work. Here are four prompts that prove it.

    The Gemini CLI might change how I work. Here are four prompts that prove it.

    Yesterday morning, we took the wraps off one of the most interesting Google releases of 2025. The Gemini CLI is here, giving you nearly unlimited access to Gemini from directly within the terminal. This is a new space, but there are other great solutions already out there. Why is this different? Yes, it’s good at multi-step reasoning, code generation, and creative tasks. Build apps, fix code, parse images, build slides, analyze content, or whatever. But what’s truly unique is that It’s fully open source, no cost to use, usable anywhere, and super extensible. Use Gemini 2.5 Pro’s massive context window (1m tokens), multimodality, and strong reasoning ability to do some amazing stuff.

    Requirements? Have Node installed, and a Google account. That’s it. You get lots of free queries against our best models. You get more by being a cloud customer if you need it. Let’s have a quick look around, and then I’ll show you four prompts that demonstrate what it can really do.

    The slash command shows me what’s available here. I can see and resume previous chats, configure the editor environment, leverage memory via context files like GEMINI.md, change the theme, and use tools. Choosing that option shows us the available tools such as reading files and folders, finding files and folders, performing Google searches, running Shell commands, and more.

    The Gemini CLI has many extensibility points, including use of MCP servers. I added the Cloud Run MCP server but you can add anything here.

    I’m only scratching the surface here, so don’t forget to check out the official repo, docs, and blog post announcement. But now, let’s walk through four prompts that you can repeat to experience the power of the Gemini CLI, and why each is a big deal.

    Prompt #1 – Do some research.

    Software engineering is more than coding. You spend time researching, planning, and thinking. I want to build a new app, but I’m not sure which frontend framework I should use. And I don’t want stale answers from an LLM that was trained a year ago.

    I’ve got a new research report on JavaScript frameworks, and also want to factor in web results. My prompt:

    What JavaScript framework should I use to build my frontend app? I want something simple, standards-friendly, and popular. Use @report.pdf for some context, but also do a web search. Summarize the results in a way that will help me decide.

    The Gemini CLI figured out some tools to use, successfully considered the file into the prompt, started off on its work searching the web, and preparing results.

    The results were solid. I got tradeoff and analysis on three viable options. The summary was helpful and I could have continued going back and forth on clarifying questions. For architects, team leaders, and engineers, having a research partner in the terminal is powerful.

    Why was this a big deal? This prompt showed the use of live Google Search, local (binary) file processing, and in-context learning for devs. These tools are changing how I do quick research.

    Prompt #2 – Build an app.

    These tools will absolutely change how folks build, fix, change, and modernize software. Let’s build something new.

    I fed in this prompt, based on my new understanding of relevant JavaScript frameworks.

    Let’s build a calendar app for my family to plan a vacation together. It should let us vote on weeks that work best, and then nominate activities for each day. Use Vue.js for the JavaScript framework.

    Now to be sure, we didn’t build this to be excellent at one-shot results. Instead, it’s purposely built for an interactive back-and-forth with the software developer. You can start it with –yolo mode to have it automatically proceed without asking permission to do things, and even with –b to run it headless assuming no interactivity. But I want to stay in control here. So I’m not in YOLO mode.

    I quickly got back a plan, and was asked if I wanted to proceed.

    Gemini CLI also asks me about running Shell commands. I can allow it once, allow it always, or cancel. I like these options. It’s fun watching Gemini make decisions and narrate what it’s working on. Once it’s done building directories, writing code, and evaluating its results, the CLI even starts up a server so that I can test the application. The first draft was functional, but not attractive, so I asked for a cleanup.

    The next result was solid, and I could have continued iterating on new features along with look and feel.

    Why was this a big deal? This prompt showed iterative code development, important security (request permission) features, and more. We’ll also frequently offer to pop you into the IDE for further coding. This will change how I understand or bootstrap most of the code I work with.

    Prompt #3 – Do a quick deploy to the cloud.

    I’m terrible at remembering the syntax and flags for various CLI tools. The right git command or Google Cloud CLI request? Just hopeless. The Gemini CLI is my solution. I can ask for what I want, and the Gemini CLI figures out the right type of request to make.

    We added MCP as a first-class citizen, so I added the Cloud Run MCP server, as mentioned above. I also made this work without it, as the Gemini CLI figured out the right way to directly call the Google Cloud CLI (gcloud) to deploy my app. But, MCP servers provide more structure and ensure consistent implementation. Here’s the prompt I tried to get this app deployed. Vibe deployment, FTW.

    Ship this code to Cloud Run in us-west1 using my seroter-project-base project. Don’t create a Dockerfile or container, but just deploy the source files.

    The Gemini CLI immediately recognizes that a known MCP tool can help, and shows me the tool it chose.

    It got going, and shipped my code successfully to Cloud Run using the MCP server. But the app didn’t start correctly. The Gemini CLI noticed that by reading the service logs, and diagnosed the issue. We didn’t provide a reference for which port to listen on. No problem.

    It came up with a fix, made the code changes, and redeployed.

    Why was this a big deal? We saw the extensibility of MCP servers, and the ability to “forget” some details of exactly how other tools and CLIs work. Plus we observed that the Gemini CLI did some smart reasoning and resolved issues on its own. This is going to change how I deploy, and how much time I spend (waste?) deploying.

    Prompt #4 – Do responsible CI/CD to the cloud.

    The third prompt was cool and showed how you can quickly deploy to a cloud target, even without knowing the exact syntax to make it happen. I got it working with Kubernetes too. But can the Gemini CLI help me do proper CI/CD, even if I don’t know exactly how to do it? In this case I do know how to set up Google Cloud Build and Cloud Deploy, but let’s pretend I don’t. Here’s the prompt.

    Create a Cloud Build file that would build a container out of this app code and store it in Artifact Registry. Then create the necessary Cloud Deploy files that defines a dev and production environment in Cloud Run. Create the Cloud Deploy pipeline, and then reference it in the Cloud Build file so that the deploy happens when a build succeeds. And then go ahead trigger the Cloud Build. Pay very careful attention for how to create the correct files and syntax needed for targeting Cloud Run from Cloud Deploy.

    The Gemini CLI started by asking me for some info from my Google Cloud account (project name, target region) and then created YAML files for Cloud Build and Cloud Deploy. It also put together a CLI command to instantiate a Docker repo in Artifact Registry. Now, I know that the setup for Cloud Deploy working with Cloud Run has some specific syntax and formatting. Even with my above command, I can see that I didn’t get syntactically correct YAML in the skaffold file.

    I rejected the request of the Gemini CLI to do a deployment, since I knew it would fail. Then I gave it the docs URL for setting up Cloud Run with Cloud Deploy and asked it to make a correction.

    That Skaffold file doesn’t look correct. Take a look at the docs (https://cloud.google.com/deploy/docs/deploy-app-run), and follow its guidance for setting up the service YAML files, and referencing the right Skaffold version at the top. Show me the result before pushing a change to the Cloud Deploy pipeline.

    Fortunately, the Gemini CLI can do a web fetch and process the latest product documentation. I did a couple of turns and got what I wanted. Then I asked it to go ahead and update the pipeline and trigger Cloud Build.

    It failed at first because I didn’t have a Dockerfile, but after realizing that, automatically created one and started the build again.

    It took a few iterations of failed builds for the Gemini CLI to land on the right syntax. But it kept dutifully trying, making changes, and redeploy until it got it right. Just like I would have if I were doing it myself!

    After that back and forth a few times, I had all the right files, syntax, container artifacts, and pipelines going.

    Some of my experiments went faster than others, but that’s the nature of these tools, and I still did this faster overall than I would have manually.

    Why was this a big deal? This showcased some sophisticated file creation, iterative improvements, and Gemini CLI’s direct usage of the Google Cloud CLI to package, deploy, and observe running systems in a production-like way. It’ll change how confident I am doing more complex operations.

    Background agents, orchestrated agents, conversational AI. All of these will play a part in how we design, build, deploy, and operate software. What does that mean to your team, your systems, and your expectations? We’re about to find out.

  • From code to cloud: Check out six new integrations that make it easier to host your apps and models on Cloud Run

    From code to cloud: Check out six new integrations that make it easier to host your apps and models on Cloud Run

    Where you decide to run your web app is often a late-binding choice. Once you’ve finished coding something you like and done some localhost testing, you seek out a reasonable place that gives you a public IP address. Developers have no shortage of runtime host options, including hyperscalers, rented VMs from cheap regional providers, or targeted services from the likes of Firebase, Cloudflare, Vercel, Netlify, Fly.io, and a dozen others. I’m an unapologetic fanboy of Google Cloud Run—host scale-to-zero apps, functions, and jobs that offer huge resource configurations, concurrent calls, GPUs, and durable volumes with a generous free tier and straightforward pricing—and we just took the wraps of a handful of new ways to take a pile of code and turn it into a cloud endpoint.

    Vibe-code a web app in Google AI Studio and one-click deploy to Cloud Run

    Google AI Studio is really a remarkable. Build text prompts against our leading models, generate media with Gemini models, and even build apps. All at no cost. We just turned on the ability to do simple text-to-app scenarios, and added a button that deploys your app to Cloud Run.

    First, I went to the “Build” pane and added a text prompt for my new app. I wanted a motivational quote printed on top of an image of an AI generated dog.

    In one shot, I got the complete app including the correct backend AI calls to Gemini models for creating the motivational quote and generating a dog pic. So cool.

    Time to ship it. There’s rocket ship icon on the top right. Assuming you’ve connected Google AI Studio to a Google Cloud account, you’re able to pick a project and one-click deploy.

    It takes just a few seconds, and you get back the URL and a deep link to the app in Google Cloud.

    Clicking that link shows that this is a standard Cloud Run instance, with the Gemini key helpfully added as an environment variable (versus hard coded!).

    And of course, viewing the associated link takes me to my app that gives me simple motivation and happy dogs.

    That’s such a simple development loop!

    Create an .NET app in tools like Cursor and deploy it using the Cloud Run MCP server

    Let’s say you’re using one of the MANY agentic development tools that make it simpler to code with AI assistance. Lots of you like Cursor. It supports MCP as a way to reach into other systems via tools.

    We just shipped a Cloud Run MCP server, so you can make tools like Cursor aware of Cloud Run and support straightforward deployments.

    I started in Cursor and asked it to build a simple REST API and picked Gemini 2.5 Pro as my preferred model. Cursor does most (all?) of the coding work for you if you want it to.

    It went through a few iterations to land on the right code. I tested it locally to ensure the app would run.

    Cursor has native support for MCP. I added a .cursor directory to my project and dropped in a mcp.json file in there. Cursor picked up the MCP entry, validated it, and showed me the available tools.

    I asked Cursor to deploy my C# app. It explored the local folder and files to ensure it had what it needed.

    Cursor realized it had a tool that could help, and proposed the “deploy_local_folder” tool from the Cloud Run MCP server.

    After providing some requested values (location, etc), Cursor successfully deployed my .NET app.

    That was easy. And this Cloud Run MCP server will work with any of your tools that understand MCP.

    Push an open model from Google AI Studio directly to Cloud Run

    Want to deploy a model to Cloud Run? It’s the only serverless platform I know of that offers GPUs. You can use tools like Ollama to deploy any open model to Cloud Run, and I like that we made even easier for Gemma fans. To see this integration, you pick various Gemma 3 editions in Google AI Studio.

    Once you’ve done that, you’ll see a new icon that triggers a deployment directly to Cloud Run. Within minutes, you have an elastic endpoint providing inference.

    It’s not hard to deploy open models to Cloud Run. This option makes it that much easier.

    Deploy an Python agent built with the Agent Development Kit to Cloud Run with one command

    The Agent Development Kit is an open source framework and toolset that devs use to build robust AI agents. The Python version reached 1.0 yesterday, and we launched a new Java version too. Here, I started with a Python agent I built.

    Built into ADK are a few deployment options. It’s just code, so you can run it anywhere. But we’ve added shortcuts to services like Google Cloud’s Vertex AI Agent Engine and Cloud Run. Just one command puts my agent onto Cloud Run!

    We don’t yet have this CLI deployment option for the Java ADK. But it’s also simple to use the Google Cloud CLI command to deploy a Java app or agent to Cloud Run with one command too.

    Services like Cloud Run are ideal for your agents and AI apps. These built-in integrations for ADK help you get these agents online quickly.

    Use a Gradio instance in Cloud Run to experiment with prompts after one click from Vertex AI Studio

    How do you collaborate or share prompts with teammates? Maybe you’re using something like Google Cloud Vertex AI to iterate on a prompt yourself. Here, I wrote system instructions and a prompt for helping me prioritize my work items.

    Now, I can click “deploy an app” and get a Gradio instance for experimenting further with my app.

    This has public access by default, so I’ve got to give the ok.

    After a few moments, I have a running Cloud Run app! I’m shown this directly from Vertex AI and have a link to open the app.

    That link brings me to this Gradio instance that I can share with teammates.

    The scalable and accessible Cloud Run is ideal for spontaneous exploration of things like AI prompts. I like this integration!

    Ship your backend Java code to Cloud Run directly from Firebase Studio

    Our final example looks at Firebase Studio. Have you tried this yet? It’s a free to use, full-stack dev environment in the cloud for nearly any type of app. And it supports text-to-app scenarios if you don’t want to do much coding yourself. There are dozens of templates, including one for Java.

    I spun up a Java dev environment to build a web service.

    This IDE will look familiar. Bring in your favorite extensions, and we’ve also pre-loaded this with Gemini assistance, local testing tools, and more. See here that I used Gemini to add a new REST endpoint to my Java API.

    Here on the left is an option to deploy to Cloud Run!

    After authenticating to my cloud account and picking my cloud project, I could deploy. After a few moments, I had another running app in Cloud Run, and had a route to make continuous updates.

    Wow. That’s a lot of ways to go from code to cloud. Cloud Run is terrific for frontend or backend components, functions or apps, open source or commercial products. Try one of these integrations and tell me what you think!

  • Did we just make platform engineering much easier by shipping a cloud IDP?

    Did we just make platform engineering much easier by shipping a cloud IDP?

    I’m a believer in platform engineering as a concept. Bringing standardization and golden paths to developers so that they can ship software quickly and safely sounds awesome. And it is. But it’s also been a slog to land it. Measurement has been inconsistent, devs are wildly unhappy with the state of self-service, and the tech landscape is disjointed with tons of tools and a high cost of integration. Smart teams are finding success, but this should be easier. Maybe now it is.

    Last week at Google Cloud Next ’25, we took the wraps off the concept of a Cloud Internal Developer Platform (IDP). Could we take the best parts of platform engineering—consistent config management, infrastructure orchestration, environment management, deployment services, and role-based access—and deliver them as a vertically-integrated experience? Can we shift down instead of putting so much responsibility on the developer? I think we can. We have to! Our goal at Google Cloud is to deliver a Cloud IDP that is complete, integrated, and application-centric. The cloud has typically been a pile of infrastructure services, loosely organized through tags or other flawed grouping mechanisms. We’re long overdue for an app-centric lens on the cloud.

    Enough talking. Let me show you by walking through an end-to-end platform engineering scenario. I want to design and deploy an application using architecture templates, organize the deployed artifacts into an “application”, troubleshoot an issue, and then get visibility into the overall health of the application.

    Design and deploy app architectures with Application Design Center

    To make life difficult IDP also stands for “internal developer portal.” That’s not confusing at all. Such a portal can serve as the front-door for a dev team that’s interacting with the platform. Application Design Center (ADC) is now in public preview, and offers functionality for creating templates, storing templates in catalogs, sharing templates, and deploying instances of templates.

    I can start with an existing ADC template or create a brand new one. Or, I can use the ever-present Cloud Assist chat to describe my desired architecture in natural language, iterate on it, and then create an ADC template from that. Super cool!

    For the rest of this example, I’ll use an existing app template in ADC. This one consists of many different components. Notice that I’ve got Cloud Run (serverless) components, virtual machines, storage buckets, secrets, load balancers, and more. Kubernetes coming soon!

    I can add to this architecture by dropping and configuring new assets onto the canvas. I can also use natural language! From the Cloud Assist chat, I asked to “add a cache to the movie-frontend service” and you can see that I got a Redis cache added. And the option to accept or reject the suggestion.

    Worried that you’re just working in a graphical design surface? Everything on the canvas is represented as Terraform. Switching from “Design” to “Code” at the top reveals the clean Terraform generated by ADC. Use our managed Terraform service or whatever you want for your infrastructure orchestration workflow with Terraform.

    When I’m done with the template and want to instantiate my architecture, I can turn this into a deployed app. Google Cloud takes care of all the provisioning, and the assets are held together in an application grouping.

    ADC is powerful for a few reasons. It works across different runtimes and isn’t just a Kubernetes solution. ADC offers good template cataloging and sharing capabilities. Its support for natural language is going to be very useful. And its direct integration with other parts of the platform engineering journey is important. Let’s see that now.

    Organize apps with App Hub

    An “app” represents many components, as we just saw. They might even span “projects” in your cloud account. And an application should have clearly identified owners and criticality. Google Cloud App Hub is generally available, and acts as a real-time registry of resources and applications.

    App Hub auto-discovers resources in your projects (a couple dozen types so far, many more to come) and lets you automatically (via ADC) or manually group them into applications.

    For a given app, I can see key metadata like its criticality and environment. I can also see who the development, business, and operations owners are. And of course, I can see a list of all the resources that make up this application.

    Instead of this being a static registry, App Hub maintains links to the physical resources you’ve deployed. Once I have an application, then what?

    Observe app-centric metrics in Cloud Monitoring

    It’s not easy to see how apps or app-related components are performing. Now it is. We just enabled the preview of Application Monitoring in our Cloud Monitoring service.

    From here, I can a list of all my App Hub apps, and the component performance of each.

    When I drill into the “web server” resource, I get some terrific metrics and logs, all within whatever timeframe I specify. This is a high-density view, and I like the data points we surface here.

    Again, we’re seeing a smart, integrated set of technologies here versus a series of independent stack pieces that aren’t deeply aware of the other.

    Resolve issues using Cloud Assist Investigations

    In that dashboard above, I’m seeing that container restarts are a real issue in this application. It’s time to troubleshoot!

    Within this dashboard, I see embedded logs, and notice a warming about back-off restarting with my pods. I don’t love reading piles of JSON to try and figure out the problem, nor can I see all the ancillary content just by looking at this log entry. In private preview we have this new Investigate button.

    Clicking that button sparks a new Investigation. These are AI-fueled evaluations based on a given error, and a host of related application data points. It’s meant to be a holistic exploration.

    Heres where all that shared context is so valuable. In under a minute, I see the details of the Investigation. These details show the issue itself and then a series of “relevant observations.” An Investigation can be edited and re-run, downloaded, and more.

    Most importantly, there’s a “Hypothesis” section that helps the app owner or SRE pinpoint the problem area to focus on. These seem well-described with clear recommendations.

    I’m confident that this will be a supremely useful tool for those trying to quickly resolve application issues.

    Manage the overall health of the application in Cloud Hub

    What’s your “home page” for the applications you manage? That’s the idea behind the preview of the Cloud Hub. It offers app owners a starting point for the management, health, and optimization of the apps their care about.

    I might start each day looking at any platform-wide incidents impacting my app, any deployment issues, service health, and more.

    One private preview feature I’ll show you here is the “Optimization” view. I’m getting app-level cost and utilization summaries! It’s easy to view this for different time periods, and even drill into on a specific product within the app. What a useful view for identifying the actual cost of a running application in dev, test, or prod.

    Summary

    While platform engineering has been around a while, and cloud computing even longer, neither has been easy for people who just want to build and run apps. Google Cloud is uniquely set up to make this better, and this new Cloud IDP experience might be an important step forward. Try out some of the components yourself!