Author: Richard Seroter

  • How to build and deploy a portable AI agent that uses a managed memory service

    How to build and deploy a portable AI agent that uses a managed memory service

    I enjoy building with new frameworks and services. Do you? It’s fun to break new ground. That said, I’m often filled with regret as a I navigate incomplete docs, non-existent search results, and a dearth of human experts to bother. Now add LLMs that try to help but accidentally set you back. Good times. But we persevere. My goal? Build an AI agent—it helps you plan a career change—that retains memory through long-running conversations, and is portable enough that it can run on most any host. Easy enough, yes?

    My weapons of choice were the Agent Development Kit (Python), the new fully-managed Vertex AI Memory Bank service, and runtime hosts including Google Cloud Run and Vertex AI Agent Engine. Most every sample I found for this tech combination was either PhD level coding with excessive functionality, a hard-coded “hello world” that didn’t feel realistic, or a notebook-like flow that didn’t translate to an independent agent. I craved a simple, yet complete, example of what a real, hosted, and memory-infused agent looks like. I finally got it all working, it’s very cool, and wanted to share steps to reproduce it.

    Vertex AI Memory Bank showing memories from my AI agent

    Let’s go through this step by step, and I’ll explain the various gotchas and such that weren’t clear from the docs or existing samples. Note that I am NOT a Python developer, but I think I follow some decent practices here.

    First, I wanted a new Python virtual environment for the folder containing my app.

     python3 -m venv venv
    source venv/bin/activate
    

    I installed the latest version of the Google ADK.

    pip install google-adk
    

    My source code is here, so you can just download the requirements.txt file and install the local dependencies you need.

    pip install -r requirements.txt
    

    I’ve got an __init__.py file that simply contains:

    from . import agent
    

    Now the agent.py itself where all the logic lives. Let’s go step by step, but this all is from a single file.

    import os
    import sys
    from google.adk.agents import Agent
    from google.adk.tools import agent_tool
    from google.adk.tools import google_search
    
    from google import adk
    from google.adk.runners import Runner
    from google.adk.sessions import VertexAiSessionService
    from google.adk.memory import VertexAiMemoryBankService
    from google.api_core import exceptions
    

    Nothing earth-shattering here. But I use a mix of built-in tools including Google Search. And I’m using durable storage for sessions and memory (versus the default in-memory options) and importing those references.

    app_name = 'career_agent'
    
    # Retrieve the agent engine ID needed for the memory service
    agent_engine_id = os.environ.get("GOOGLE_CLOUD_AGENT_ENGINE_ID")
    

    Our agent app needs a name for the purpose of storing sessions and memory through ADK. And that agent_engine_id is important for environments where it’s not preloaded (e.g. outside of Vertex AI Agent Engine).

    # Create a durable session for our agent
    session_service = VertexAiSessionService()
    print("Vertex session service created")
    
    # Instantiate the long term memory service, needs agent_engine parameter from environment or doesn't work right
    memory_service = VertexAiMemoryBankService(
        agent_engine_id=agent_engine_id)
    print("Vertex memory service created")
    

    Here I create instances of the VertexAiSessionService and VertexAiMemoryBankService. These refer to fully managed, no ops needed, services that you can use standalone wherever your agent runs.

    # Use for callback to save the session info to memory
    async def auto_save_session_to_memory_callback(callback_context):
        try:
            await memory_service.add_session_to_memory(
                callback_context._invocation_context.session
            )
            print("\n****Triggered memory generation****\n")
        except exceptions.GoogleAPICallError as e:
            print(f"Error during memory generation: {e}")
    

    Now we’re getting somewhere. This function (thanks to my colleague Megan who I believe came up with it) will be invoked as a callback during session turns.

    # Agent that does Google search
    career_search_agent_memory = Agent(
        name="career_search_agent_memory",
        model="gemini-2.5-flash",
        description=(
            "Agent answers questions career options for a given city or country"
        ),
        instruction=(
            "You are an agent that helps people figure out what types of jobs they should consider based on where they want to live."
        ),
        tools=[google_search],
    )
    

    That’s agent number one. It’s a secondary agent that just does a real-time search to supplement the LLM’s knowledge with real data about a given job in a particular city.

    # Root agent that retrieves memories and saves them as part of career plan assistance
    root_agent = Agent(
        name="career_advisor_agent_memory",
        model="gemini-2.5-pro", # Using a more capable model for orchestration
        description=(
            "Agent to help someone come up with a career plan"
        ),
        instruction=(
            """
            **Persona:** You are a helpful and knowledgeable career advisor.
    
            **Goal:** Your primary goal is to provide personalized career recommendations to users based on their skills, interests, and desired geographical location.
    
            **Workflow:**
    
            1.  **Information Gathering:** Your first step is to interact with the user to gather essential information. You must ask about:
                *   Their skills and areas of expertise.
                *   Their interests and passions.
                *   The city or country where they want to work.
    
            2.  **Tool Utilization:** Once you have identified a potential career and a specific geographical location from the user, you **must** use the `career_search_agent_memory` tool to find up-to-date information about job prospects.
    
            3.  **Synthesize and Respond:** After obtaining the information from the `career_search_agent_memory` tool, you will combine that with the user's stated skills and interests to provide a comprehensive and helpful career plan.
    
            **Important:** Do not try to answer questions about career options in a specific city or country from your own knowledge. Always use the `career_search_agent_memory` tool for such queries to ensure the information is current and accurate.
            """
        ),
        tools=[adk.tools.preload_memory_tool.PreloadMemoryTool(), agent_tool.AgentTool(career_search_agent_memory), ],
        after_agent_callback=auto_save_session_to_memory_callback,
    )
    

    That’s the root agent. Let’s unpack it. I’ve got some fairly detailed instructions to help it use the tool correctly and give a good response. Also note the tools. I’m preloading memory so that it gets context about existing memories, even if they happened five sessions ago. It’s got a tool reference to that “search” agent I defined above. And then after the agent generates a response, we save the key memories to the Memory Bank.

    runner = Runner(
        agent=root_agent,
        app_name=app_name,
        session_service=session_service,
        memory_service=memory_service)
    

    Finally, I’ve got a Runner. I’m not positive this is even used when the agent runs on Vertex AI Agent Engine, but it plays a part when running elsewhere.

    That’s it. 87 lines in one file. Writing the code wasn’t the hard part; knowing what to do and how to shape the agent was where all the work happened.

    Let’s deploy, and test it all out with cURL commands. To deploy this to the fully-managed Vertex AI Agent Engine, it’s a single ADK command now. You need to provide it a Cloud Storage bucket name (for storing artifacts), but that’s about it.

    adk deploy agent_engine \
        --project=seroter-project-base \
        --region=us-central1 \
        --staging_bucket=gs://seroter-agent-memory-staging \
        --display_name="Career Agent with Memory" \
        --trace_to_cloud \
        career_agent_memory/
    

    When this finishes, I saw a bucket loaded up with code and other artifacts.

    Files generated and stored by ADK for my deployed agent

    More importantly, I had an agent. Vertex AI Agent Engine has a bunch of pre-built observability dashboards, and an integrated view of sessions and memory.

    Vertex AI Agent Engine dashboard in the Google Cloud Console

    Let’s use this agent, and see if it does what it’s supposed to. I’m going to use cURL commands, so that it’s super clear as to what’s happening.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:query \
    -d '{"class_method": "create_session", "input": {"user_id": "u_123"},}'
    

    This first command creates a new session for our agent chat. The authorization comes from injecting a Google Cloud token into the header. I plugged in the “resource name” of the Agent Engine instance into the URI and set a user ID. I get back something like this:

    {
      "output": {
        "userId": "u_123",
        "id": "5926526278264946688",
        "events": [],
        "appName": "8479666769873600512",
        "state": {},
        "lastUpdateTime": 1760395538.0874159
      }
    }
    

    That “id” value matches the session ID now visible in the Vertex AI Session list. This session is for the given user, u_123.

    A session created for the agent running in the Vertex AI Agent Engine

    Now I can chat with my career agent. Here’s the cURL request for submitting a query. This will trigger my root agent, call my secondary agent, and store the key memories of the interaction as a callback.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:streamQuery?alt=sse \
    -d '{"class_method": "stream_query","input": {"user_id": "u_123","session_id": "5926526278264946688","message": "I am currently a beekeeper in New Mexico. I have been to college for economics, but that was a long time ago. I am thinking about moving to Los Angeles CA and get a technology job. What are my job prospects in that region and how should I start?",}}'
    

    Note that the engine ID is still in the URI, and payload contains the user ID and session ID. What I got back was a giant answer with some usable advice on how I can take my lucrative career as a beekeeper and make my mark on the technology sector.

    What got automatically saved as a memory? Switching to the Memories view in Vertex AI, I see that a few key details about my context were durably stored.

    Memories automatically parsed and stored in the Vertex AI Memory Bank

    Now if I delete my session, come back tomorrow and start a new one, any memories for this user ID (and agent engine instance) will be preloaded into every agent request. Very cool!

    Let’s quickly prove it. I can destroy my session with this cURL command.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/168267934565/locations/us-central1/reasoningEngines/8479666769873600512:query?alt=sse \
    -d '{"class_method": "delete_session","input": {"user_id": "u_123","session_id": "5926526278264946688",}}'
    

    No more session, but my Memories remain. I can then request another session (for the same user) using the earlier command:

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:query \
    -d '{"class_method": "create_session", "input": {"user_id": "u_123"},}'
    

    At this point, I could ask something like “what do you already know about me?” in my query to see if it retrieves the memories it stored before.

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:streamQuery?alt=sse \
    -d '{"class_method": "stream_query","input": {"user_id": "u_123","session_id": "3132042709481553920","message": "What do you already know about me?",}}'
    

    Here’s what I got back:

    {"content": {"parts": [{"thought_signature": "CrgEAR_M...twKw==", "text": "You have an economics degree and are currently a beekeeper in New Mexico. You're considering a move to Los Angeles for a job in the technology sector."}], "role": "model"}, "finish_reason": "STOP", "usage_metadata": {"candidates_token_count": 32, "candidates_tokens_details": [{"modality": "TEXT", "token_count": 32}], "prompt_token_count": 530, "prompt_tokens_details": [{"modality": "TEXT", "token_count": 530}], "thoughts_token_count": 127, "total_token_count": 689, "traffic_type": "ON_DEMAND"}, "avg_logprobs": -0.8719542026519775, "invocation_id": "e-53e94a44-ad6b-4e97-9297-51612f4e77a9", "author": "career_advisor_agent_memory", "actions": {"state_delta": {}, "artifact_delta": {}, "requested_auth_configs": {}, "requested_tool_confirmations": {}}, "id": "c9e484cd-e5f7-4e1e-94d7-7490a006137d", "timestamp": 1760396342.830469}
    

    Excellent! With this approach, I have zero database management to do, yet my agents can retain context for each turn over an extended period of time.

    Vertex AI Agent Engine is cool, but what if you want to serve up your agents on a different runtime? Maybe a VM, Kubernetes, or the best app platform available, Google Cloud Run. We can still take advantage of managed sessions and memory, even if our workload runs elsewhere.

    The docs don’t explain how to do this, but I figured out the first step. You need that Agent Engine ID. When deploying to Vertex AI Agent Engine, it happened automatically. But now I need to explicitly submit an HTTP request to get back an ID to use for my agent. Here’s the request:

    curl \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    https://aiplatform.googleapis.com/v1/projects/168267934565/locations/us-central1/reasoningEngines \
    -d '{"displayName": "memory-bank-for-cloud-run"}'
    

    I get back an ID value, and I see a new entry show up for me in Vertex AI Agent Engine.

    Memory Bank instance for an agent in Cloud Run

    The ADK also supports Google Cloud Run as a deployment target, so I’ll deploy this exact agent, no code changes, there too. First, I threw a few values into the shell’s environment variables to use for the CLI command.

    export GOOGLE_CLOUD_PROJECT=seroter-project-base
    export GOOGLE_CLOUD_LOCATION=us-central1 
    export GOOGLE_GENAI_USE_VERTEXAI=True
    

    Then I issued the single request to deploy the agent to Cloud Run. Notice some different things here. First, no Cloud Storage bucket. Cloud Run creates a container from the source code and uses that. Also, I explicitly set the –memory_service_uri and –session_service_uri to enable some of the pre-wiring to those services. It didn’t work without it, and the current docs don’t include the proper parameters. And I also figured out (undocumented) how to add Cloud Run environment variables, since the Agent Engine ID was also needed there.

    adk deploy cloud_run \
    --project=$GOOGLE_CLOUD_PROJECT \
    --region=$GOOGLE_CLOUD_LOCATION \
    --service_name=career-agent \
    --app_name=career_agent \
    --port=8080 \
    --memory_service_uri=agentengine://8058017254761037824 \
    --session_service_uri=agentengine://8058017254761037824 \
    career_agent_memory/ \
    -- --set-env-vars "GOOGLE_CLOUD_AGENT_ENGINE_ID=8058017254761037824"
    

    In just a couple minutes, I ended up with an agent ready to serve on Cloud Run.

    Agent running in Cloud Run

    The URLs I use to interact with my agent are now different because we’re not calling the managed service endpoints of Vertex AI to invoke the agent. So if I want a new session to get going, I submit a cURL request like this:

    curl -X POST -H "Content-Type: application/json" -d '{}' \
        https://career-agent-168267934565.us-central1.run.app/apps/career_agent/users/u_456/sessions
    

    I’ve got no payload for this request, and specified the user name in the URL. I got back a session ID in a JSON payload like above. And I can see that session registered in my Agent Engine console.

    Session created based on web request

    Submitting queries to this agent is slightly different than when it was hosted in Vertex AI Agent Engine. For Cloud Run agents, the cURL request looks like this:

    curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        https://career-agent-168267934565.us-central1.run.app/run_sse \
        -H "Content-Type: application/json" \
        -d '{
        "app_name": "career_agent",
        "user_id": "u_456",
        "session_id": "311768995957047296",
        "new_message": {
            "role": "user",
            "parts": [{
            "text": "I am currently a cowboy in Las Vegas. I have been to college for political science, but that was a long time ago. I am thinking about moving to San Francisco CA and getting a technology job. What are my job prospects in that region and how should I start?"
            }]
        },
        "streaming": false
        }'
    

    After a moment, not only do I get a valid answer from my agent, but I also see that the callback fired and I’ve got durable memories in Vertex AI Memory Bank.

    Memories saved for the Cloud Run agent

    Just like before, I could end this session, start a new one, and the memories still apply . Very nice.

    Access to sessions and memories that scale as your agent does, or survive compute restarts, seems like a big deal. You can use your own database to store these, but I like having a fully managed option that handles every part of it for me. Once you figure out the correct code and configurations, it’s fairly easy to use. You can try this all yourself in Google Cloud with your existing account, or a new account with a bunch of free credits.

  • Daily Reading List – October 13, 2025 (#647)

    I never seem to get my demo apps working on the first pass, but it always turns out to be a (painful) blessing in disguise. Instead of taking a couple of hours to build an agent demo, it took a couple of weeks. But I was forced to read source code, experiment, and learn so much more than if it worked the first time. I’ll post my experiences tomorrow.

    [blog] F*ck it and Let it Rip. Try your hardest and have fun. A performance approach mindset is the way to go.

    [article] Becoming an AI-first business requires a total organizational mindshift. It’s true and many won’t make it. Not because they’re not smart, but because it takes a level of acceptable recklessness to institute the change.

    [blog] I’m in Vibe Coding Hell. I liked the points here. There’s a new challenge for self-learners who used to be dependent on the tutorial to get work done; now they’re dependent on their AI tool.

    [blog] Predictions 2026: Tech Leadership Will Be Wild — Bring Your Surfboard, Your Calculator, And Maybe A Clone. Yah, I can’t imagine being a team or organization leader in tech next year. Wait a minute.

    [article] Salesforce bets on AI ‘agents’ to fix what it calls a $7 billion problem in enterprise software. The unique circus of Dreamforce is going on this week, so expect all sorts of announcements. Here’s one about the new AgentForce 360.

    [blog] Quantum computing 101 for developers. My boss is deep into this, but I’ve only stayed peripherally aware. But I thought this was a good article for bringing folks up to speed.

    [article] Java or Python for building agents? A silly question twelve months ago, not so much today.

    [blog] Agents That Prove, Not Guess: A Multi-Agent Code Review System. It’s tempting to just dump a single prompt or pile of context into an agent and want something good back. But Ayo shows a better approach if you care about repeatability and transparency.

    [paper] Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models. Nonstop research and experimentation into making AI models more trustworthy and useful.

    [blog] What’s 🔥 in Enterprise IT/VC #467. This is one my favorite weekend reads. Doesn’t hurt that I showed up in this one.

    [blog] The Architect’s Dilemma. Good look at how you’d decide between using tools in your agent architecture, or use agents talking to agents.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 10, 2025 (#646)

    Whew. A frantic day until about 3pm, and then I got a chance to work on an AI agent I’ve been messing with. I’m building it for use in a new blog post, but timeboxing how much more effort I put in. Hopefully posting next week.

    [blog] 150 of the latest AI use cases from leading startups and digital natives. Get inspired by what startups are doing with AI right now.

    [blog] 1,001 real-world gen AI use cases from the world’s leading organizations. We might have gone too far here. This started with an innocent 101 use cases, and now we’re drunk on stories and sharing 1,001 of them.

    [blog] Embracing the parallel coding agent lifestyle. Is every developer a type of “manager” now? There’s a new workstyle that involves coordinating a series of agents doing various bits of work for you.

    [blog] Vibe engineering. Also from Simon, this builds on the previous post. Now we’re talking broader engineering practices, not just using AI for a snippet of code.

    [article] Control Codegen Spend. Unless you have an all-you-can-eat license (which seems rare), there are cost and consumption considerations with AI-assisted coding tools.

    [blog] Gemini Computer Use: Giving LLMs Hands and Eyes. Our new computer use model definitely landed on my “try this out” list. Heiko gives it a run through here.

    [blog] The Weak Point in MCP Nobody’s Talking About: API Versioning. Seems like something to consider, but I also don’t know how versioning might be different when we’re talking agents and tools.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 9, 2025 (#645)

    Another day, another chance to learn something new. Today’s reading list had some useful data points, fresh ideas, and new products.

    [blog] Introducing Gemini Enterprise. The AI platform era is here. It’s not just about a collection of random products. It’s about intentionally connecting people, systems, and knowledge bases so that we can get better work done. If you’re a Google shop, Microsoft shop, IBM shop or whatever, Gemini Enterprise is a major upgrade. More, from Sundar.

    [blog] 4 ways Gemini Enterprise makes work easier for everyone. It’s useful to see actual examples of this platform in action.

    [blog] How I Learned to Stop Worrying and Trust AI Coding Agents. Yes, we can learn some cool things from watching these agents at work.

    [blog] Platform Shifts Redefine Apps. Important concepts here. What an “app” is changes with each tech platform evolution. Are you working with a modern definition?

    [blog] Googler Michel Devoret awarded the Nobel Prize in Physics. It’s hard to have a big head at work when you’re surrounded by so many brilliant folks.

    [guide] Choose a design pattern for your agentic AI system. This is fantastic, vendor-neutral architecture guidance for someone who wants to learn agent design patterns and when to pick each one.

    [article] What the 2025 DORA Report means for your AI strategy. Extremely good takeaways in this post if you’re wondering how you’re supposed to actually land and scale AI at your company.

    [blog] Predictions 2026: AI Moves From Hype To Hard Hat Work. Prediction season is upon us! See what various analysts and thought leaders are guessing.

    [article] When You’re the Executive Everyone Relies On—and You’re Burning Out. Good advice here that resonates with me and how I’m feeling right now.

    [blog] Give me AI slop over human sludge any day. He’s not wrong. Why are we automatically assuming AI-created stuff is worse or less useful than human created stuff?

    [article] Survey: Engineers Want To Code, But Spend All Day on Tech Debt. Oof, these numbers are rough. Coding is rarely the bottleneck in your teams. It’s the 84% of the time that you’re not able to do what you love doing.

    [blog] How to Build AI Security Agents with Gemini CLI. Good rationale here for when to build a tool versus when to leave the action to the agent to figure out.

    [blog] LoRA Explained: Faster, More Efficient Fine-Tuning with Docker. If you kinda understood this approach to updating a model without a full training run, you’ll learn more from this post.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 8, 2025 (#644)

    Flying home after a great day with friends at Comcast and talking about AI assisted engineering. Everyone is looking for the playbook for effectively landing AI in their org. Maybe I’ll write a book about it 🙂

    [blog] Five Best Practices for Using AI Coding Assistants. A few months ago, our CEO asked me to run a few long-running engineering experiments with my team. Here’s what we learned from using the full spectrum of today’s AI coding tools to get work done.

    [blog] Stitching with the new Jules API. I don’t think we’re just going to improve the existing toolchain. Instead, a new toolchain is emerging where you simply work differently.

    [blog] The State of CI/CD in 2025: Key Insights from the Latest JetBrains Survey. What do folks use, are they using AI in CI/CD (answer: no), and why do folks use more than one tool? Answers here.

    [blog] Now open for building: Introducing Gemini CLI extensions. This is a huge capability that brings all sorts of data and functionality into your agentic CLI.

    [article] To scale agentic AI, Notion tore down its tech stack and started fresh. Bravo. Sometimes you need to reset, not refactor.

    [article] Kubernetes for agentic apps: A platform engineering perspective. While I’m extremely bullish on lightweight platforms for hosting agentic apps, there’s absolutely a place for Kubernetes in the mix. Abdel makes a good case.

    [article] How to write nonfunctional requirements for AI agents. I haven’t thought much about this, but sure, there are going to be some adjustments to what requirements you gather for AI apps.

    [blog] Expanding access to Opal, our no-code AI mini-app builder. When I tried this tool, it wasn’t what I expected. But it’s an interesting way to build a new style of app.

    [blog] Not Another Workflow Builder. LangChain isn’t interested in adding to the pile of visual workflow builders. We’re back to arguing workflows versus agents again too!

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 7, 2025 (#643)

    Flew to Philadelphia today to do a keynote at a customer’s internal developer conference tomorrow. Should be fun, although I’m definitely getting a little burnt out by all the recent travel!

    [blog] Introducing the Gemini 2.5 Computer Use model. What can you build with agents that can navigate user interfaces? Try out our new Computer Use model.

    [article] From Autocomplete to Agents: Mapping the Design Space of AI Coding Assistants. Great analysis. These ten dimensions, and six personas, are a useful way to understand the landscape.

    [blog] Why Google Cloud Run is the most underrated container platform – Part 1. I’d go as far to say that it’s one of the best (compute) services on the Internet.

    [blog] MCP Development with Rust, Gemini CLI, and Google Cloud Run. Here’s one example why Cloud Run is great. You have virtually no limit on programming language or scenario.

    [article] Beyond the Hype: Architecting Systems with Agentic AI. Long conversation, but some interesting points about some of the bigger lifecycle considerations for AI agents.

    [blog] Databases on K8s — Really? (part 8). My colleague at Google has been sharing a series of thoughts about his journey to appreciating containers and Kubernetes a a viable host for databases.

    [blog] Spec-driven development: Using Markdown as a programming language when building with AI. You can do spec-driven development in many ways. GitHub created a cool framework that they’re demonstrating here.

    [blog] Connect Spark data pipelines to Gemini and other AI models with Dataproc ML library. Integrating apps or data pipelines with AI models is getting more and more seamless. This looks like a very handy way to bring Gemini models to Spark data.

    [blog] Accelerate AI with Agents: Event Series for Developers in EMEA. This roadshow has been overbooked in every city it’s visited. We’re rolling across EMEA, and see if your hometown is in the mix.

    [blog] Ask a Techspert: What is vibe coding? We’re all builders now. What we build, and how production-ready it is, depends on the builder and the circumstances.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 6, 2025 (#642)

    Shorter reading list today, but some great depth. I liked the thinking and challenge that many of these pieces offered.

    [blog] Introducing CodeMender: an AI agent for code security. This AI is both proactive and reactive in helping secure your code from vulnerabilities. This looks like the future to me.

    [blog] The RAG Obituary: Killed by Agents, Buried by Context Windows. I’ve seen a few obituaries for RAG, but maybe the agentic workflows really do resolve some of the issues that made RAG necessary in the first place.

    [article] Microsoft retires AutoGen and debuts Agent Framework to unify and govern enterprise AI agents. There are too many frameworks to choose from, so it’s a good thing when a new one explicitly replaces an old one.

    [article] OpenAI launches AgentKit to help developers build and ship AI agents. I’m wrong. Apparently we need more frameworks and agent-building tools. OpenAI announced many interesting things today during their Dev Day. Also updated models.

    [article] Top executives jump on AI upskilling. Great to see. The right type of executive upskilling with shrink the expectation-gap between leaders and their employees.

    [blog] vercel vs cloudflare: two philosophies of building for developers. A good rivalry forces participants to keep upping their game. This one is fascinating. I like Guillermo and the Vercel team but either way, developers win.

    [blog] More choice, more control: self-deploy proprietary models in your VPC with Vertex AI. Great. Better control and security when running these leading models in the cloud.

    [article] How to Drive Digital Innovation Without Wasting Resources. Some of this goes against conventional wisdom, which is why I liked it.

    [blog] Scaling Engineering Teams: Lessons from Google, Facebook, and Netflix. You can’t copy culture and practices from one team and assume they work on the next. But we can still learn from what others do.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 3, 2025 (#641)

    Heading into what I hope is a lowkey weekend. I’m excited to not do a whole lot. May your weekend be as action-packed or boring as you’re hoping for.

    [blog] In Praise of RSS and Controlled Feeds of Information. At least 80% of my reading list comes from the RSS feeds I follow. The rest is from more real-time sources, newsletters, or random links I come across.

    [blog] A collaborative approach to image generation. Not quite getting the image you’re after? The Google Research team looks at a new reinforcement learning agent that helps refine images by learning user preferences.

    [article] What’s AI’s impact on software delivery performance? (2025 DORA Report). Analysis of our recent report is trickling out, and I thought this piece bubbled up some key findings.

    [blog] This Week in Gemini CLI (vol. 5). Very good updates this week. I really like the potential of a managed todo list of agentic tasks.

    [article] When Managing Your Team Becomes Too Much. There are times that I feel this. I manage a big (awesome) group and it can be overwhelming. Good advice here.

    [blog] AI Won’t Replace Design Engineers, But It Will Change How We Work. Actively define how (or if, or where) AI complements your work before someone else does.

    [blog] The History of Core Web Vitals. I knew basically none of this. I remember AMP and such, but didn’t realize all this other work happened. Impressive!

    [article] Top CIO conferences to attend in 2026. Not a bad list of events to consider if you’re looking to make smarter strategic choices.

    [article] Who stops wasteful cloud spending? “Waste” is inevitable, right? It’s difficult or impossible to be fully utilized. But aim for better! That said, how much on-prem spending is wasted? I’d argue it’s more 🙂

    [article] Generative AI drives cloud spend blitz. You’re going to spend more in the cloud, because it offers the best way to take advantage of AI tech.

    [article] The EM’s guide to AI adoption (without your engineers hating it). Here’s a good playbook for incorporating AI into your engineering team in a way that sticks.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 2, 2025 (#640)

    I’m grateful that the most persistent stress in my life comes from following my favorite sports team. There are worse things in life. But one of these years, I really would like my hometown team to, just one time, raise the trophy.

    [blog] Gemini 2.5 Flash Image now ready for production with new aspect ratios. The best image model in the world is now GA. These new aspect ratios are terrific.

    [blog] Building on the bananas momentum of generative media models on Google Cloud. What stood out to me here was a bulleted list at the bottom that clearly explained when to use which model. We’re getting better at doing that.

    [blog] Inside Husky’s query engine: Real-time access to 100 trillion events. That’s a lot of events. The Datadog team explains their event store and how its query engine works.

    [blog] Gemini CLI: Discover, configure, debug, and document a Google Cloud feature (Part 2). These AI tools can help everyone learn new things. One of our technical writers, Moi, explains how she used the Gemini CLI to go deeper on a product area she’s writing about.

    [blog] Two strategies to succeed when AI seems to be eroding jobs around you. This is relevant to the above post. Tom’s a tech writer here at Google, and has great perspective on what skills need further investment.

    [blog] Real AI Agents and Real Work. We’re learning more about human-agent interactions. Agents are capable, and still require our judgement to decide what’s worth doing with them.

    [blog] Examples are the best documentation. Great post. People who deliver projects and products need to consider all the examples the user needs access to.

    [article] OutSystems Launches a Low-Code Workbench for Building Enterprise AI Agents. Of course low-code vendors will get into the vibe and agent game. They’re irrelevant if they don’t!

    [article] Salesforce launches enterprise vibe coding product, Agentforce Vibes. See previous point. This seems natural for Salesforce, as they’ve got piles of data for you to ground on.

    [blog] Introducing Microsoft Agent Framework: The Open-Source Engine for Agentic AI Apps. Do we need more agent frameworks? Sure, why not. This one targets .NET devs.

    [blog] Designing agentic loops. What does a responsible YOLO mode look like when using your coding agent of choice? Simon’s got a helpful post on designing good agentic loops.

    [blog] Meet Jules Tools: A Command Line Companion for Google’s Async Coding Agent. This is cool. Control and interact with this remote agent via the CLI.

    [blog] Angular support for generating apps in Google AI Studio is now available. Excellent! It’s now simple to vibe code web apps that use Angular and then deploy them.

    [blog] Build Your Foundation First: The Hard Truth About Successful AI Deployments. No skipping ahead. There are prerequisites required to get good AI results.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – October 1, 2025 (#639)

    Had a day-trip to Sunnyvale to present to a room full of Google Cloud partners. Wasn’t feeling 100%, but still seemed to go ok. I referenced something from today’s reading list, which proves to me that I should keep doing the work to uncover these pieces 🙂

    [blog] Effective context engineering for AI agents. Exceptionally good writeup that helps us better understand what context engineering is all about, why it matters, and how to approach it.

    [blog] Cloud CISO Perspectives: Boards should be ‘bilingual’ in AI, security to gain advantage. I think a lot of non-engineering teams are thinking about how to become more fluent in AI so that they can make better decisions.

    [article] Boards lack AI savvy as adoption advances: PwC. I can see why the above advice is so important. Most board members are flying blind.

    [paper] On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub. I haven’t seen a study like this one. It explores pull requests created by Claude Code, and how agent-generated PRs differ in size, acceptance rate, and such.

    [blog] 100X Faster: How We Supercharged Netflix Maestro’s Workflow Engine. There’s probably no action item for you after reading this post, but it’s ok to just enjoy learning about what other people built.

    [blog] Responsible Agents: A Phased Approach on Google Cloud. Excellent post. Here’s a well-rounded look at making responsible decisions during design, development, evaluation, deployment, and operations of agent systems.

    [blog] Securing MCP Servers with Spring AI. Learn how to build MCP servers, but pay special attention to how you secure them too.

    [blog] Forecasts and data insights come to BigQuery’s MCP and ADK tools. Or use turnkey MCP servers that already have great tools available in a secure fashion.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below: