Good day here in Austin. I like talking to folks who have their own perspectives on what AI and the current tech landscape looks like. Helps me pop my own ideological bubble!
[blog] Bringing AI to the next generation of fusion energy. AI shops are showing their focus areas right now. While we’re doing fun consumer AI with our models, our most meaningful work is happening in the sciences.
[article] Rethinking operations in an agentic AI world. Fundamental concepts may remain the same, but the implementation is changing. James takes a look at a fresh way to look at ops when dealing with agent workloads.
Flying to Austin right now for my last trip of this wacky run of five weeks or so with travel. I’ll be doing a keynote at Cognizant’s big annual conference and also slipping out to Waco to see my kiddo in college.
[blog] Introducing Beads: A coding agent memory system. Color me interested. Might this be a better way to have some persistent memory between coding sessions, better than dumping sessions to Markdown files?
[article] How to Be a Great Coach—Even When You’re Busy. Nobody is too busy to help others grow. I refuse to believe it. How can you coach well when you’ve got a fairly packed calendar? Here’s some advice.
[article] Zoom dooms the developer’s afternoon. It’s not Zoom’s fault alone, but the fact that it’s super easy to set up virtual meetings that constantly interrupt developers.
Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:
[blog] The new AI-driven SDLC. Some good thoughts here, and I suspect we’ll see more written about this in the coming months.
[blog] 2024 Open Source Contributions: A Year in Review. Yes, these numbers are impressive. But what’s better is the display of the breadth of engagement needed for healthy open source ecosystems.
[blog] Unpacking Cloudflare Workers CPU Performance Benchmarks. This recent independent test had Vercel coming out a clear performance winner. I like that Cloudflare’s response was particularly defensive, and that this triggered some improvements in their end. Well done.
[blog] DevRel Activity Patterns… published. I had the chance to provide content reviews of this book, and imagine this material will be useful to many teams.
[blog] DevRel is -Unbelievably- Back. More open positions. I hope this isn’t for classic DevRel that had looser metrics and softer connection to company success. I doubt it!
Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:
I enjoy building with new frameworks and services. Do you? It’s fun to break new ground. That said, I’m often filled with regret as a I navigate incomplete docs, non-existent search results, and a dearth of human experts to bother. Now add LLMs that try to help but accidentally set you back. Good times. But we persevere. My goal? Build an AI agent—it helps you plan a career change—that retains memory through long-running conversations, and is portable enough that it can run on most any host. Easy enough, yes?
My weapons of choice were the Agent Development Kit (Python), the new fully-managed Vertex AI Memory Bank service, and runtime hosts including Google Cloud Run and Vertex AI Agent Engine. Most every sample I found for this tech combination was either PhD level coding with excessive functionality, a hard-coded “hello world” that didn’t feel realistic, or a notebook-like flow that didn’t translate to an independent agent. I craved a simple, yet complete, example of what a real, hosted, and memory-infused agent looks like. I finally got it all working, it’s very cool, and wanted to share steps to reproduce it.
Vertex AI Memory Bank showing memories from my AI agent
Let’s go through this step by step, and I’ll explain the various gotchas and such that weren’t clear from the docs or existing samples. Note that I am NOT a Python developer, but I think I follow some decent practices here.
First, I wanted a new Python virtual environment for the folder containing my app.
python3 -m venv venv
source venv/bin/activate
I installed the latest version of the Google ADK.
pip install google-adk
My source code is here, so you can just download the requirements.txt file and install the local dependencies you need.
pip install -r requirements.txt
I’ve got an __init__.py file that simply contains:
from . import agent
Now the agent.py itself where all the logic lives. Let’s go step by step, but this all is from a single file.
import os
import sys
from google.adk.agents import Agent
from google.adk.tools import agent_tool
from google.adk.tools import google_search
from google import adk
from google.adk.runners import Runner
from google.adk.sessions import VertexAiSessionService
from google.adk.memory import VertexAiMemoryBankService
from google.api_core import exceptions
app_name = 'career_agent'
# Retrieve the agent engine ID needed for the memory service
agent_engine_id = os.environ.get("GOOGLE_CLOUD_AGENT_ENGINE_ID")
Our agent app needs a name for the purpose of storing sessions and memory through ADK. And that agent_engine_id is important for environments where it’s not preloaded (e.g. outside of Vertex AI Agent Engine).
# Create a durable session for our agent
session_service = VertexAiSessionService()
print("Vertex session service created")
# Instantiate the long term memory service, needs agent_engine parameter from environment or doesn't work right
memory_service = VertexAiMemoryBankService(
agent_engine_id=agent_engine_id)
print("Vertex memory service created")
Here I create instances of the VertexAiSessionService and VertexAiMemoryBankService. These refer to fully managed, no ops needed, services that you can use standalone wherever your agent runs.
# Use for callback to save the session info to memory
async def auto_save_session_to_memory_callback(callback_context):
try:
await memory_service.add_session_to_memory(
callback_context._invocation_context.session
)
print("\n****Triggered memory generation****\n")
except exceptions.GoogleAPICallError as e:
print(f"Error during memory generation: {e}")
Now we’re getting somewhere. This function (thanks to my colleague Megan who I believe came up with it) will be invoked as a callback during session turns.
# Agent that does Google search
career_search_agent_memory = Agent(
name="career_search_agent_memory",
model="gemini-2.5-flash",
description=(
"Agent answers questions career options for a given city or country"
),
instruction=(
"You are an agent that helps people figure out what types of jobs they should consider based on where they want to live."
),
tools=[google_search],
)
That’s agent number one. It’s a secondary agent that just does a real-time search to supplement the LLM’s knowledge with real data about a given job in a particular city.
# Root agent that retrieves memories and saves them as part of career plan assistance
root_agent = Agent(
name="career_advisor_agent_memory",
model="gemini-2.5-pro", # Using a more capable model for orchestration
description=(
"Agent to help someone come up with a career plan"
),
instruction=(
"""
**Persona:** You are a helpful and knowledgeable career advisor.
**Goal:** Your primary goal is to provide personalized career recommendations to users based on their skills, interests, and desired geographical location.
**Workflow:**
1. **Information Gathering:** Your first step is to interact with the user to gather essential information. You must ask about:
* Their skills and areas of expertise.
* Their interests and passions.
* The city or country where they want to work.
2. **Tool Utilization:** Once you have identified a potential career and a specific geographical location from the user, you **must** use the `career_search_agent_memory` tool to find up-to-date information about job prospects.
3. **Synthesize and Respond:** After obtaining the information from the `career_search_agent_memory` tool, you will combine that with the user's stated skills and interests to provide a comprehensive and helpful career plan.
**Important:** Do not try to answer questions about career options in a specific city or country from your own knowledge. Always use the `career_search_agent_memory` tool for such queries to ensure the information is current and accurate.
"""
),
tools=[adk.tools.preload_memory_tool.PreloadMemoryTool(), agent_tool.AgentTool(career_search_agent_memory), ],
after_agent_callback=auto_save_session_to_memory_callback,
)
That’s the root agent. Let’s unpack it. I’ve got some fairly detailed instructions to help it use the tool correctly and give a good response. Also note the tools. I’m preloading memory so that it gets context about existing memories, even if they happened five sessions ago. It’s got a tool reference to that “search” agent I defined above. And then after the agent generates a response, we save the key memories to the Memory Bank.
Finally, I’ve got a Runner. I’m not positive this is even used when the agent runs on Vertex AI Agent Engine, but it plays a part when running elsewhere.
That’s it. 87 lines in one file. Writing the code wasn’t the hard part; knowing what to do and how to shape the agent was where all the work happened.
Let’s deploy, and test it all out with cURL commands. To deploy this to the fully-managed Vertex AI Agent Engine, it’s a single ADK command now. You need to provide it a Cloud Storage bucket name (for storing artifacts), but that’s about it.
When this finishes, I saw a bucket loaded up with code and other artifacts.
Files generated and stored by ADK for my deployed agent
More importantly, I had an agent. Vertex AI Agent Engine has a bunch of pre-built observability dashboards, and an integrated view of sessions and memory.
Vertex AI Agent Engine dashboard in the Google Cloud Console
Let’s use this agent, and see if it does what it’s supposed to. I’m going to use cURL commands, so that it’s super clear as to what’s happening.
This first command creates a new session for our agent chat. The authorization comes from injecting a Google Cloud token into the header. I plugged in the “resource name” of the Agent Engine instance into the URI and set a user ID. I get back something like this:
That “id” value matches the session ID now visible in the Vertex AI Session list. This session is for the given user, u_123.
A session created for the agent running in the Vertex AI Agent Engine
Now I can chat with my career agent. Here’s the cURL request for submitting a query. This will trigger my root agent, call my secondary agent, and store the key memories of the interaction as a callback.
curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:streamQuery?alt=sse \
-d '{"class_method": "stream_query","input": {"user_id": "u_123","session_id": "5926526278264946688","message": "I am currently a beekeeper in New Mexico. I have been to college for economics, but that was a long time ago. I am thinking about moving to Los Angeles CA and get a technology job. What are my job prospects in that region and how should I start?",}}'
Note that the engine ID is still in the URI, and payload contains the user ID and session ID. What I got back was a giant answer with some usable advice on how I can take my lucrative career as a beekeeper and make my mark on the technology sector.
What got automatically saved as a memory? Switching to the Memories view in Vertex AI, I see that a few key details about my context were durably stored.
Memories automatically parsed and stored in the Vertex AI Memory Bank
Now if I delete my session, come back tomorrow and start a new one, any memories for this user ID (and agent engine instance) will be preloaded into every agent request. Very cool!
Let’s quickly prove it. I can destroy my session with this cURL command.
At this point, I could ask something like “what do you already know about me?” in my query to see if it retrieves the memories it stored before.
curl \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/seroter-project-base/locations/us-central1/reasoningEngines/8479666769873600512:streamQuery?alt=sse \
-d '{"class_method": "stream_query","input": {"user_id": "u_123","session_id": "3132042709481553920","message": "What do you already know about me?",}}'
Here’s what I got back:
{"content": {"parts": [{"thought_signature": "CrgEAR_M...twKw==", "text": "You have an economics degree and are currently a beekeeper in New Mexico. You're considering a move to Los Angeles for a job in the technology sector."}], "role": "model"}, "finish_reason": "STOP", "usage_metadata": {"candidates_token_count": 32, "candidates_tokens_details": [{"modality": "TEXT", "token_count": 32}], "prompt_token_count": 530, "prompt_tokens_details": [{"modality": "TEXT", "token_count": 530}], "thoughts_token_count": 127, "total_token_count": 689, "traffic_type": "ON_DEMAND"}, "avg_logprobs": -0.8719542026519775, "invocation_id": "e-53e94a44-ad6b-4e97-9297-51612f4e77a9", "author": "career_advisor_agent_memory", "actions": {"state_delta": {}, "artifact_delta": {}, "requested_auth_configs": {}, "requested_tool_confirmations": {}}, "id": "c9e484cd-e5f7-4e1e-94d7-7490a006137d", "timestamp": 1760396342.830469}
Excellent! With this approach, I have zero database management to do, yet my agents can retain context for each turn over an extended period of time.
Vertex AI Agent Engine is cool, but what if you want to serve up your agents on a different runtime? Maybe a VM, Kubernetes, or the best app platform available, Google Cloud Run. We can still take advantage of managed sessions and memory, even if our workload runs elsewhere.
The docs don’t explain how to do this, but I figured out the first step. You need that Agent Engine ID. When deploying to Vertex AI Agent Engine, it happened automatically. But now I need to explicitly submit an HTTP request to get back an ID to use for my agent. Here’s the request:
I get back an ID value, and I see a new entry show up for me in Vertex AI Agent Engine.
Memory Bank instance for an agent in Cloud Run
The ADK also supports Google Cloud Run as a deployment target, so I’ll deploy this exact agent, no code changes, there too. First, I threw a few values into the shell’s environment variables to use for the CLI command.
Then I issued the single request to deploy the agent to Cloud Run. Notice some different things here. First, no Cloud Storage bucket. Cloud Run creates a container from the source code and uses that. Also, I explicitly set the –memory_service_uri and –session_service_uri to enable some of the pre-wiring to those services. It didn’t work without it, and the current docs don’t include the proper parameters. And I also figured out (undocumented) how to add Cloud Run environment variables, since the Agent Engine ID was also needed there.
In just a couple minutes, I ended up with an agent ready to serve on Cloud Run.
Agent running in Cloud Run
The URLs I use to interact with my agent are now different because we’re not calling the managed service endpoints of Vertex AI to invoke the agent. So if I want a new session to get going, I submit a cURL request like this:
curl -X POST -H "Content-Type: application/json" -d '{}' \
https://career-agent-168267934565.us-central1.run.app/apps/career_agent/users/u_456/sessions
I’ve got no payload for this request, and specified the user name in the URL. I got back a session ID in a JSON payload like above. And I can see that session registered in my Agent Engine console.
Session created based on web request
Submitting queries to this agent is slightly different than when it was hosted in Vertex AI Agent Engine. For Cloud Run agents, the cURL request looks like this:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://career-agent-168267934565.us-central1.run.app/run_sse \
-H "Content-Type: application/json" \
-d '{
"app_name": "career_agent",
"user_id": "u_456",
"session_id": "311768995957047296",
"new_message": {
"role": "user",
"parts": [{
"text": "I am currently a cowboy in Las Vegas. I have been to college for political science, but that was a long time ago. I am thinking about moving to San Francisco CA and getting a technology job. What are my job prospects in that region and how should I start?"
}]
},
"streaming": false
}'
After a moment, not only do I get a valid answer from my agent, but I also see that the callback fired and I’ve got durable memories in Vertex AI Memory Bank.
Memories saved for the Cloud Run agent
Just like before, I could end this session, start a new one, and the memories still apply . Very nice.
Access to sessions and memories that scale as your agent does, or survive compute restarts, seems like a big deal. You can use your own database to store these, but I like having a fully managed option that handles every part of it for me. Once you figure out the correct code and configurations, it’s fairly easy to use. You can try this all yourself in Google Cloud with your existing account, or a new account with a bunch of free credits.
I never seem to get my demo apps working on the first pass, but it always turns out to be a (painful) blessing in disguise. Instead of taking a couple of hours to build an agent demo, it took a couple of weeks. But I was forced to read source code, experiment, and learn so much more than if it worked the first time. I’ll post my experiences tomorrow.
[blog] F*ck it and Let it Rip. Try your hardest and have fun. A performance approach mindset is the way to go.
[blog] I’m in Vibe Coding Hell. I liked the points here. There’s a new challenge for self-learners who used to be dependent on the tutorial to get work done; now they’re dependent on their AI tool.
[blog] Quantum computing 101 for developers. My boss is deep into this, but I’ve only stayed peripherally aware. But I thought this was a good article for bringing folks up to speed.
[blog] Agents That Prove, Not Guess: A Multi-Agent Code Review System. It’s tempting to just dump a single prompt or pile of context into an agent and want something good back. But Ayo shows a better approach if you care about repeatability and transparency.
Whew. A frantic day until about 3pm, and then I got a chance to work on an AI agent I’ve been messing with. I’m building it for use in a new blog post, but timeboxing how much more effort I put in. Hopefully posting next week.
[blog] Embracing the parallel coding agent lifestyle. Is every developer a type of “manager” now? There’s a new workstyle that involves coordinating a series of agents doing various bits of work for you.
[blog] Vibe engineering. Also from Simon, this builds on the previous post. Now we’re talking broader engineering practices, not just using AI for a snippet of code.
[article] Control Codegen Spend. Unless you have an all-you-can-eat license (which seems rare), there are cost and consumption considerations with AI-assisted coding tools.
Another day, another chance to learn something new. Today’s reading list had some useful data points, fresh ideas, and new products.
[blog] Introducing Gemini Enterprise. The AI platform era is here. It’s not just about a collection of random products. It’s about intentionally connecting people, systems, and knowledge bases so that we can get better work done. If you’re a Google shop, Microsoft shop, IBM shop or whatever, Gemini Enterprise is a major upgrade. More, from Sundar.
[blog] Platform Shifts Redefine Apps. Important concepts here. What an “app” is changes with each tech platform evolution. Are you working with a modern definition?
[blog] Give me AI slop over human sludge any day. He’s not wrong. Why are we automatically assuming AI-created stuff is worse or less useful than human created stuff?
Flying home after a great day with friends at Comcast and talking about AI assisted engineering. Everyone is looking for the playbook for effectively landing AI in their org. Maybe I’ll write a book about it 🙂
[blog] Five Best Practices for Using AI Coding Assistants. A few months ago, our CEO asked me to run a few long-running engineering experiments with my team. Here’s what we learned from using the full spectrum of today’s AI coding tools to get work done.
[blog] Stitching with the new Jules API. I don’t think we’re just going to improve the existing toolchain. Instead, a new toolchain is emerging where you simply work differently.
[blog] Not Another Workflow Builder. LangChain isn’t interested in adding to the pile of visual workflow builders. We’re back to arguing workflows versus agents again too!
Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:
Flew to Philadelphia today to do a keynote at a customer’s internal developer conference tomorrow. Should be fun, although I’m definitely getting a little burnt out by all the recent travel!
[blog] Databases on K8s — Really? (part 8). My colleague at Google has been sharing a series of thoughts about his journey to appreciating containers and Kubernetes a a viable host for databases.
[blog] Ask a Techspert: What is vibe coding? We’re all builders now. What we build, and how production-ready it is, depends on the builder and the circumstances.
Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:
[article] Top executives jump on AI upskilling. Great to see. The right type of executive upskilling with shrink the expectation-gap between leaders and their employees.