Author: Richard Seroter

  • Daily Reading List – July 25, 2025 (#595)

    Landed late last night and had a two hour talk + demo this morning at 7am. I’m really good at scheduling. Next week I’m on vacation (staycation with the kids), but will probably still do reading lists each day. Enjoy your weekend!

    [blog] Google Gemini CLI Cheatsheet. Many, many of you have downloaded and tried this agentic CLI. But what can you do with it? I like Philipp’s cheatsheet.

    [blog] Model Context Protocol (MCP) explained: An FAQ. Some good Q&A from our friends over at Vercel. This is mostly the positive perspective, but still quite useful.

    [blog] Exploring the Evolution of Google’s Community Forum: A Comparison of Old and New features. We’ve got a revamped community environment for everyone who builds with Google technologies. Looks great.

    [article] Stop Prompting, Start Designing: 5 Agentic AI Patterns That Actually Work. Some good visualizations here that might help you understand some agent patterns you’ve heard of.

    [blog] Mastering agentic workflows with ADK: Sequential agents. Here’s an agent pattern in practice. Learn how to build an agent that runs in a step-by-step sequence.

    [blog] Zombie Burnout: A New Way to Think About the Restless Exhaustion of Modern Life. Burnout doesn’t just come from overwork. it comes from work that doesn’t inspire you any longer. Good advice here.

    [blog] Aeneas transforms how historians connect the past. Wow, this is rad. AI trained for Latin (but possible to do for other languages) to help historians understand ancient inscriptions.

    [article] How Rust-Based Zed Built World’s Fastest AI Code Editor. Many (most?) of the AI editors and tools out there are based on Visual Studio Code, or a fork. Zed is something different.

    [blog] New Cluster Director features: Simplified GUI, managed Slurm, advanced observability. Standing up and managing AI infrastructure can be a lot of work. I’ve seen folks struggle with it. This updated experience in Google Cloud looks like a big step forward.

    [blog] A Practical Guide to Multimodal Data Analytics. Jeff goes deep into this unify analysis of multimodal content. It’s pretty cool that you can use SQL and Python here.

    [article] How Elicitation in MCP Brings Human-in-the-Loop to AI Tools. Here’s a new feature of MCP that you should take a look at.

    [article] Is Google Cloud winning the AI era? Winning takes many forms, but we’re doing some things right at the moment.

    [blog] Amazon Q: Now with Helpful AI-Powered Self-Destruct Capabilities. Oof. Rough story, but it can happen. Community input is great, but you have to be careful on those PRs.

    [blog] Practical Gemini CLI: Bring your own system instruction. Learn all about overriding the system prompt of the Gemini CLI. The extensibility here is powerful.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 24, 2025 (#594)

    In the air flying home after a solid week in Manhattan. Doing some research into MCP during the flight, and glad to be able to search my own reading list for past links to review!

    [blog] Managing Links with Gemini: Validating Links for Semantic Correctness. This is super cool work from our documentation team. Make sure your links go where you think they do.

    [blog] NotebookLM: One-Click Accident Report Comprehension. This tool is so useful for understanding complex topics. The ability to generate videos and mind maps ensures a few ways you can digest the content.

    [paper] A Survey of Context Engineering for Large Language Models. This paper looks at the emerging discipline of context engineering, some patterns, and where gaps exist.

    [blog] AI agents vs. predefined workflows: practical decision guide. Everything doesn’t need to be an agent. This post from Capital One points out when to use an agent, and when an AI workflow will do.

    [blog] Mastering agentic workflows with ADK for Java: Sub-Agents. Excellent post that will help you better understand the ideas behind agents, subagents, agents-as-tools, and coordination of it all.

    [blog] 25+ top gen AI how-to guides for enterprise. We’re awash in tutorials and how-to material. Here’s an attempt to pull together a handful of representative guides.

    [blog] My AI Code Reviewer Needed a Project, So I Vibe-Coded One for It. I had a lot of code review content yesterday, and here’s another. This offers some prompts you might use to review a codebase.

    [blog] A2A Server with an MCP backend in Cloud Run. Expose an API using MCP, and have agents that can talk to other agents using the A2A protocol.

    [blog] To MCP or not to MCP? But maybe chill out on all the MCP stuff as there are cost and control implications.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 23, 2025 (#593)

    Today was fun and tiring with a mix of leadership offsite meetings and stepping in for a sick colleague to present at a large group meeting. It was nice to end that meeting and see Alphabet’s great quarterly earnings.

    [article] Alphabet’s Q2 revenue beats estimates as cloud computing surges. Heckuva quarter. The business is solid, and our cloud organization is accelerating. Here are the CEO remarks, including excitement about serving OpenAI as a customer.

    [article] How CEOs Hone and Harness Their Intuition. I’ve seen some people and teams hide behind data because they didn’t trust their intuition. Here’s how to build it up.

    [blog] Should You Market Your API with an MCP Server? If your API is your product, why do you need an MCP server? Adam does a good job explaining when it makes sense.

    [blog] Goals Take Practice. If you’re naturally good at setting goals, congrats. John looks at the many types of goals, and why practice is needed to get good at them.

    [blog] Building an AI-Powered Podcast Generator with Gemini and Cloud Run. Neat example that others can follow to get customized and generated podcasts.

    [blog] How I Prioritize OSS Bugs. How an OSS maintainer prioritizes bugs could be different than how you do for a commercial or internal product, but maybe not.

    [blog] Context Engineering for AI Agents: Lessons from Building Manus.It’s an emerging discipline, but advice like this will help you advance your thinking on context engineering.

    [blog] Debugging the One-in-a-Million Failure: Migrating Pinterest’s Search Infrastructure to Kubernetes. These “we had a problem and here’s how we investigated it” posts are always a fun read to me. The Pinterest team chased down an issue that was affected a small subset of users, but posed a risk.

    [blog] Why I’m Betting Against AI Agents in 2025 (Despite Building Them). This is a pragmatic look at what can go wrong with agents (correctness at scale, cost, etc) and worth a read.

    [blog] Stop Leaked Credentials in Their Tracks with Veles, Our New Open-Source Secret Scanner. Very valuable capability. Keep yourself safe, even as we ship more software.

    [blog] Code review in the AI age. Are we shipping faster with AI, or shifting our time from coding to reviewing? Whichever it is, the practice of code review is as important as ever.

    [blog] The evolution of code review practices in the world of AI. This post looks at how AI augments, instead of replaces, human code reviews.

    [article] How Does AI Disrupt Accountability in Code Reviews? Another post on code reviews that highlights the human factor and the team accountability that code reviews encourage.

    [blog] The Dataproc advantage: Advanced Spark features that will transform your analytics and AI. DIY Spark versus a managed service is a choice. We’re trying to make it harder for you 🙂

    [article] Spring AI 1.0 Delivers Easy AI Systems and Services. I liked this intro from Josh into the world of Spring AI for Java app builders.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 22, 2025 (#592)

    Definitely seeing more “how this tech role is changed by AI” pieces lately. I’ve got two in today’s edition that look at architects and product managers. And some non-AI content for those of you burned out on the topic de jour.

    [blog] Gemini 2.5 Flash-Lite is now stable and generally available. Fast and low cost. What’s not to like? We’re also getting good at making these available across our surfaces at the same time.

    [article] Architecting the MVP in the Age of AI. Where does AI help architects? This article calls out a few areas where it can make a difference.

    [blog] My Product Management Toolkit (67): Using AI to write a PRD. Yes, you can use AI as a product manager, too. Here, creating a product requirements doc.

    [article] Moving from an orchestration-heavy to leadership-heavy management role. Are you supervising, managing, or leading? There’s a difference. Will talks about a move to leadership.

    [blog] Chainguard builds a market, everyone else wants in. Kudos to Chainguard for basically creating a new market for secure container images. Lots of other folks now playing in the space.

    [article] Employee curiosity fuels Shadow AI adoption faster than IT can keep up. You can’t stop AI usage, you can only hope to contain it.

    [article] Hackers exploiting SharePoint zero-day seen targeting government agencies. I admittedly haven’t thought about SharePoint in a long time, but I know many folks still depend on it. Get patched!

    [article] 6 Design Principles for Edge Computing Systems. Good lessons from some folks who really understand edge systems.

    [blog] Introducing OSS Rebuild: Open Source, Rebuilt to Last. This new project looks solid, and already has a way to use it to get insights into your supply chain.

    [blog] Coding with LLMs in the summer of 2025 (an update). Salvatore provides a current perspective on using AI coding assistants and agents. He also offers up some good advice for maximizing their value.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 21, 2025 (#591)

    I’m in New York and successfully navigated planes, trains, and walking 15 blocks. Lots of great AI content in my feed today, and I now gift that to you.

    [blog] Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad. That’s pretty darn remarkable. And soon, it’ll be in the hands of Google customers.

    [article] The Big LLM Architecture Comparison. Wow, fantastic post from Sebastian that shows us what’s new, relevant, and interesting with modern LLM architectures.

    [blog] Adding Tools to Your AI Agent — The Scalable Way. In my own blog post from last week, I showed how to use the “API tool” in an agent. It’s a great approach.

    [article] 5 key questions your developers should be asking about MCP. These are good questions to ask. Why use it? How do I secure it? What’s the long-term viability?

    [blog] Exploring the context of online images with Backstory. Trustworthiness is hard to come by given all this AI-generated content. Maybe this will help.

    [blog] Welcoming The Next Generation of Programmers. Great post. Don’t look down on vibe coders or those getting in the field. We’ve all started somewhere. Help them find a community so that they keep developing their engineering skills.

    [docs] Generate videos with Veo 3. What a great example of useful docs. Samples are tight, parameters are defined, and examples abound. If you’re doing text (or image) to video prompting, you can learn from this.

    [blog] Is AI Coming For Your Team Next? How AI Hype Is Becoming The Hot Layoff Excuse. For some teams and roles, I can understand pausing any new hiring until you see what the current team can do with AI assistance. I don’t yet see many cases where replacement makes sense.

    [blog] Application monitoring in Google Cloud: Bridging manual and AI-assisted troubleshooting. New paradigms are on the way. It’ll be interesting to see how folks mix manual setup and intervention with AI-driven issue resolution.

    [blog] Nobody Knows How To Build With AI Yet. Amusing and thought-provoking. I tell folks that the two main traits you need to exhibit nowadays are curiosity and humility. Don’t be over-confident, and just keep learning.

    [blog] Mastering Agentic Development with Gemini and Roo Code. We’re all just trying new tools, and picking up new techniques constantly. Roo Code looks cool.

    [article] Planning an Offsite for Your Leadership Team? Ask These 5 Questions. Are your leadership offsites kinda lame? It happens. I found these prompting questions interesting, and likely to stimulate the right type of chatter.

    [article] AI Agents Are Creating a New Security Nightmare for Enterprises and Startups. New tech and patterns are often disruptive. Just be smart about your day-2 considerations.

    [blog] Unlock Gemini’s reasoning: A step-by-step guide to logprobs on Vertex AI. Now you can retrieve some low-level details about probability choice of tokens returned by Gemini models.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 18, 2025 (#590)

    I had a good day. Next week I’m in New York City for some customer meetings and offsites. Expect the same rhythm of reading lists because this is now a habit!

    [blog] Vibe coding and the silent AI war inside tech companies. Good point here that the AI acceleration isn’t across all tasks. We’ll get some things done much faster, others, not so much.

    [blog] BigQuery’s New Cost Controls Are Here to Help. “Unlimited” is a bad default quota amount for a billable service. JK points out some new BigQuery default quotas that’ll help you avoid runaway bills.

    [blog] Giants awaken. Google Cloud GeminiCLI, AWS Kiro, developer experience and the need to ship and keep shipping. James at Redmonk looks at a couple of hyperscalers flexing their developer chops.

    [blog] Back to The Future: Evaluating AI Agents on Predicting Future Events. There’s no existing data to train on for future events, so can you use AI to predict what’s going to happen?

    [blog] MCP: Bringing mashups back! Love this. Agents and MCP make it simple to combine disparate APIs in new and funky ways.

    [article] Open Source Is Too Important To Dilute. Dan says that true open source is worth defending, and we can’t water down the definition with these not-really-open definitions.

    [blog] AI Agents, the New Frontier for LLMs. Folks on my team have been producing such exceptionally good content about AI. Not just the what, but the why and how.

    [blog] Building an automated GitLab Merge Request Review Agent with Gemini CLI. Good example of a background task (CI pipeline) that can make use of an agentic tool.

    [article] Anthropic tightens usage limits for Claude Code – without telling users. I feel for them. It’s hard to predict usage, it’s not cheap to serve, and competition is fierce.

    [blog] Cloud CISO Perspectives: Our Big Sleep agent makes a big leap, and other AI news. AI is foiling attempts to exploit vulnerabilities. That’s wild. Also get a roundup of security links.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 17, 2025 (#589)

    I bumped into some engineering folks at the office who are changing how they work with agentic CLIs. It was inspiring to see a real story about how our work is changing very dramatically.

    [article] The Founder’s Guide to Building a V1 of Customer Success. Good advice here, whether you’re setting up such a team for the first time, or rebooting a stagnant team.

    [blog] Proactiveness considered harmful? A guide to customise the Gemini CLI to suit your coding style. Fantastically good post, and an example of where open source is so powerful. Daniela wants to make the Gemini CLI less proactive, and explains how to steer and customize the model’s behavior.

    [article] Leading After Your Predecessor Fails. Did you take a job where you replaced someone else? Did they bomb out? Here’s guidance for how to repair the damage.

    [blog] Simplify your Agent “vibe building” flow with ADK and Gemini CLI. We’ll see more frameworks and products doing this, I’m certain. The ADK offers an llms-full.txt file that you can give your AI tool as context. This gives you the most relevant responses back.

    [blog] Where Technology Executives Will Be Investing In 2026. Maybe no surprises here, but a reminder that the APAC market has the highest rate of IT spending growth coming up.

    [blog] Five Big Improvements to Gradio MCP Servers. It seems that many people are using Gradio to expose MCP servers, and there are new improvements to auth along with other areas.

    [blog] How Renault Group is using Google’s software-defined vehicle industry solution. Cars nowadays are basically computers with wheels. Software matters a lot, and here’s a story of how one giant manufacturer is building for the future.

    [article] What can we learn from Meta’s code improvement practices? Short, but interesting look at research into how Meta scopes, prioritizes, and executes on code improvement projects.

    [blog] Why the analyst advisor industry is getting obliterated by AI… and how to save it. Shots fired! Feels spot on, and frankly applies to anyone in a role of “thought leadership.” Step up your game.

    [blog] Build with more flexibility: New open models arrive in the Vertex AI Model Garden. DeepSeek as a service joins models like Llama in our pay-as-you-go offering. Convenient when you don’t want to manage infra, or guess about capacity.

    [blog] Vibe Coding Is the Future of Programming. Here’s How Your Company Can Get on Board. Bold title. I don’t think vibe coding as currently defined is the future. But, orchestrating AI tools is.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Code was the least interesting part of my multi-agent app, and here’s what that means to me

    Code was the least interesting part of my multi-agent app, and here’s what that means to me

    At least 80% of the code I’ve ever written could have been written by AI, probably at higher quality. I’ve been “in tech” for twenty seven years and spent seven of those as a software developer. Even when I stopped getting paid for it, I never stopped coding. But little of it’s been truly novel; most of my code has been straightforward database access code, web APIs, presentation logic, and a handful of reasonably-complex systems. No doubt, many of you have done truly sophisticated things in code—compilers, performance-tuned algorithms, language frameworks—and AI isn’t replacing that any time soon. But I’d bet that much of the interesting tech work is moving away from raw code, and towards higher-order architecture.

    I wanted to build out an agentic solution, and I used AI to generate 90% of the code. That code isn’t where the unique value was at. None of it was particularly noteworthy. You can find the whole app here. The most interesting work related to architectural decisions. Here are eight choices I had to make, and I suspect you’ll have fun wrestling with the same ones.

    Choice #1 – What am I trying to accomplish and do agents make sense?

    My goal was to build an app that could take in a customer’s roofing needs, create a service appointment, and generate a personalized invoice for the work. I’m cheating here, since this exercise started as “Richard wants to learn some agent tech.” So I did start with the end in mind. Judge me accordingly.

    But in every legit situation, we start by evaluating the user need. What functional requirements do I need to satisfy? What performance or quality attributes are necessary? Can I solve this with a simple service, or modular monolith? Is the user flow deterministic or variable?

    This scenario could certainly be solved by a simple data collection form and PDF generator. What requirements might make an agentic architecture the truly correct choice?

    • Data collection from the user requires image, video, and audio input to best scope the services and pricing we should offer.
    • The scheduling or invoicing process requires a dynamic workflow based on a variety of factors, and hard-coding all the conditions would be tricky.

    Either way, this is always a critical choice before you write a single line of code.

    Choice #2 – What data or services are available to work with?

    Before we build anything new, what do we already have at our disposal?

    In my case, let’s assume I already have an appointments web API for retrieving available appointment times and making new appointments. I’ve also got an existing database that stores promotional offers that I want to conditionally add to my customer invoice. And I’ve got an existing Cloud Storage bucket where I store customer invoice PDFs.

    It’s easy to just jump into the application build, but pause for a few moments and take stock of your existing inventory and what you can build around.

    Choice #3 – What (agent) framework should I use and why?

    So. Many. Choices.

    There’s AI app frameworks like Genkit, LlamaIndex, and Spring AI. There are agent frameworks like LangChain, LangGraph, Autogen, CrewAI, and more. Google recently shipped the Agent Development Kit, available for Python and Java developers. An agent built with something like ADK is basically made up of three things: a model, instructions, and tools. ADK adds sweeteners that give you a lot of flexibility. Things I like about ADK:

    And look, I like it because my employer invests in it. So, that’s a big factor. I also wanted to build agents in both Python and Java, and this made ADK a great choice.

    Don’t get married to any framework, but learn the fundamentals of tool use, memory management, and agent patterns.

    Choice #4 – How should I use tools in the appointment agent?

    I suspect that tool selection will be a fascinating area for many builders in the years ahead. In this scenario, I had some decisions to make.

    I don’t want to book any roof repairs on rainy days. But where can I get the weather forecast from? I chose the built-in Google Search tool instead of trying to find some weather API on the internet.

    weather_agent = Agent(
        name="weather_agent",
        model="gemini-2.0-flash",
        description=(
            "Agent answers questions about the current and future weather in any city"
        ),
        instruction=(
            "You are an agent for Seroter Roofing. You can answer user questions about the weather in their city right now or in the near future"
    ),
        tools=[google_search],
    )
    

    For interacting with my existing appointments API, what’s the right tool choice? Using the OpenAPI tool baked into the ADK, I can just hand the agent an OpenAPI spec and it’ll figure out the right functions to call. For retrieving open appointment times, that’s a straightforward choice.

    openapi_spec = openapi_spec_template.replace("{API_BASE_URL}", config.API_BASE_URL)
    
    toolset = OpenAPIToolset(spec_str=openapi_spec, spec_str_type="json")
    api_tool_get_appointments = toolset.get_tool("get_available_appointments")
    

    But what about booking appointments? While that’s also an API operation, I want to piggyback a successful booking with a message to Google Cloud Pub/Sub that downstream subscribers can read from. That’s not part of the appointments API (nor should it be). Instead, I think a function tool makes sense here, where I manually invoke the appointments API, and then make as subsequent call to Pub/Sub.

    def add_appointment(customer: str, slotid: str, address: str, services: List[str], tool_context: ToolContext) -> dict:
        """Adds a roofing appointment by calling the booking API and logs the conversation history.
    
        This function serves as a tool for the agent. It orchestrates the booking process by:
        1. Calling the internal `_book_appointment_api_call` function to make the actual API request.
        2. If the booking is successful, it retrieves the conversation history from the
           `tool_context` and logs it to a Pub/Sub topic via `_log_history_to_pubsub`.
    
        Args:
            customer: The name of the customer.
            slotid: The ID of the appointment slot to book.
            address: The full address for the appointment.
            services: A list of services to be booked for the appointment.
            tool_context: The context provided by the ADK, containing session information.
    
        Returns:
            A dictionary containing the booking confirmation details from the API,
            or an error dictionary if the booking failed.
        """
        booking_response = _book_appointment_api_call(customer, slotid, address, services)
    
        if "error" not in booking_response:
            history_list: List[Event] = tool_context._invocation_context.session.events # type: ignore
            _log_history_to_pubsub(history_list)
        
        return booking_response
    

    Choice #5 – When/how do I separate agent boundaries?

    There’s a good chance that an agentic app has more than one agent. Stuffing everything into a single agent with a complex prompt and a dozen tools seems … suboptimal.

    But multi-agent doesn’t have to mean you’re sliding into a distributed system. You can include multiple agents in the same process space and deployment artifact. The Sequential Agent pattern in the ADK makes it simple to define distinct agents that run one and at time. So it seems wise to think of service boundaries for your agents, and only make a hard split when the context changes.

    For me, that meant one set of agents handling all the appointment stuff, and another distinct set of agents that worked on invoices. These don’t depend on each other, and should run separately. Both sets of agents use the Sequential Agent pattern.

    The appointment agent has sub-agents to look up the weather, and uses that agent as a tool within the primary root agent.

    The invoicing agent is more complex with sub-agents to build up HTML out of the chat history, another agent that looks up the best promotional offers to attach to the invoice, and a final agent that generates a PDF.

    private SequentialAgent createInvoiceAgent(
                PdfTool pdfTool,
                String mcpServerUrl,
                Resource htmlGeneratorPrompt,
                Resource bestOfferPrompt,
                Resource pdfWriterPrompt
        ) {
            String modelName = properties.getAgent().getModelName();
    
            LlmAgent htmlGeneratorAgent = LlmAgent.builder().model(modelName).name("htmlGeneratorAgent").description("Generates an HTML invoice from conversation data.").instruction(resourceToString(htmlGeneratorPrompt)).outputKey("invoicehtml").build();
    
            List<BaseTool> mcpTools = loadMcpTools(mcpServerUrl);
    
            LlmAgent bestOfferAgent = LlmAgent.builder().model(modelName).name("bestOfferAgent").description("Applies the best offers available to the invoice").instruction(resourceToString(bestOfferPrompt)).tools(mcpTools).outputKey("bestinvoicehtml").build();
    
            FunctionTool generatePdfTool = FunctionTool.create(PdfTool.class, "generatePdfFromHtml");
    
            LlmAgent pdfWriterAgent = LlmAgent.builder().model(modelName).name("pdfWriterAgent").description("Creates a PDF from HTML and saves it to cloud storage.").instruction(resourceToString(pdfWriterPrompt)).tools(List.of(generatePdfTool)).build();
    
            return SequentialAgent.builder().name(properties.getAgent().getAppName()).description("Execute the complete sequence to generate, improve, and publish an PDF invoice to Google Cloud Storage.").subAgents(htmlGeneratorAgent, bestOfferAgent, pdfWriterAgent).build();
        }
    

    How should I connect these agents? I didn’t want hard-coded links between the services, as they can operate async and independently. You could imagine other services being interested in a booking too. So I put Google Cloud Pub/Sub in the middle. I used a push notification (to the invoice agent’s HTTP endpoint), but I’ll probably refactor it and make it a pull subscription that listens for work.

    Choice #6 – What’s needed in my agent instructions?

    I’m getting better at this. Still not great. But I’m using AI to help me, and learning more about what constraints and direction make the biggest impact.

    For the booking agent, my goal was to collect all the data needed, while factoring in constraints such as weather. My agent instructions here included core principles, operational steps, the must-have data to collect, which decisions to make, and how to use the available tools.

    root_agent = Agent(
        name="root_agent",
        model="gemini-2.5-flash",
        description="This is the starting agent for Seroter Roofing and customers who want to book a roofing appointment",
        instruction=(
            """
    You are an AI agent specialized in booking roofing appointments. Your primary goal is to find available appointments for roofing services, and preferably on days where the weather forecast predicts dry weather.
    
    ## Core Principles:
    
        *   **Information First:** You must gather the necessary information from the user *before* attempting to use any tools.
        *   **Logical Flow:** Follow the steps outlined below strictly.
        *   **Professional & Helpful:** Maintain a polite, professional, and helpful tone throughout the interaction.
    
    ## Operational Steps:
    
    1.  **Greeting:**
        *   Start by politely greeting the user and stating your purpose (booking roofing appointments).
        *   *Example:* "Hello! I can help you book a roofing appointment. What kind of service are you looking for today?"
    
    2.  **Information Gathering:**
        *   You need two key pieces of information from the user:
            *   **Type of Service:** What kind of roofing service is needed? (e.g., repair, replacement, inspection, estimate)
            *   **Service Location:** What city is the service required in?
        *   Ask for this information clearly if the user doesn't provide it upfront. You *cannot* proceed to tool usage until you have both the service type and the city.
        *   *Example follow-up:* "Great, and in which city is the property located?"
    
    3.  **Tool Usage - Step 1: Check Appointment Availability (Filtered):**
        *   Get information about available appointment times:
        *   **[Use Tool: Appointment availability]** for the specified city.
        *   **Crucially:** When processing the results from the appointment tool, **filter** the available appointments to show *only* those that fall on the specific dates without rain in the forecast. You should also consider the service type if the booking tool supports filtering by type.
    
    4.  **Tool Usage - Step 2: Check Weather Forecast:**
        *   Once you have the service type and city, your next action is to check the weather.
        *   **[Use Tool: 7-day weather forecast]** for the specified city.
        *   Analyze the forecast data returned by the tool. Identify which days within the next 7 days are predicted to be 'sunny' or at least dry. Be specific about what constitutes 'dry' based on the tool's output.
    
    5.  **Decision Point 1: Are there Appointments on Dry Days?**
        *   If the appointment availability tool returns available slots *specifically* on the identified dry days:
            *   Present these available options clearly to the user, including the date, time, and potentially the service type (if applicable).
            *   Explain that these options meet the dry weather preference.
            *   Prompt the user to choose an option to book.
            *   *Example:* "Great news! The forecast for [City] shows dry weather on [Date 1], [Date 2], etc. I've checked our schedule and found these available appointments on those days: [List appointments]."
    
        *   If the appointment availability tool returns slots, but *none* of them fall on the identified sunny days (or if the tool returns no slots at all):
            *   Inform the user that while there are dry days coming up, there are currently no appointments available on those specific dry dates within the next 7 days.
            *   Explain that your search was limited to the dry days based on the forecast.
            *   Suggest they might want to try a different service type (if relevant) or check back later as availability changes.
            *   *Example:* "While the forecast for [City] does show some dry days coming up, I wasn't able to find any available appointments specifically on those dates within the next week. Our schedule on sunny days is quite popular. Please try again in a few days, as availability changes, or let me know if you need a different type of service."
    
    6.  **Confirmation/Booking (If Applicable):**
        *   Be sure to get the full name and full address of the location for the appointment.
             
    **Tools**
        You have access to the following tools to assist you:
        `weather_agent`: use this tool to find the upcoming weather forecast and identify rainy days
        `api_tool_get_appointments -> json`: use this OpenAPI tool to answer any questions about available appointments
        `add_appointment(customer: str, slotid: str, address: str, services: List[str]) -> dict`: use this tool to add a new appointment
    """
        ),
        tools=[agent_tool.AgentTool(weather_agent), api_tool_get_appointments, tools.add_appointment],
    )
    

    The invoicing agent had a more complex prompt as I wanted to shape the blob of chat history into a structured JSON and then into valid HTML. Of course, I could have (should have?) structured the raw data before it left the original agent, but I wanted try it this way. My agent instructions show an example of the preferred JSON, and also the valid HTML structure.

    **Role:** You are a specialized agent designed to generate an HTML invoice from a successful appointment booking history.
    
    **Task:** Process the entire user prompt, which contains conversation history in a JSON format. Your goal is to create a complete HTML invoice based on the details found in that JSON.
    
    [...]
    
    4.  **Invoice JSON Structure:** The JSON invoice you internally generate **must** strictly adhere to the format provided in the example below. Do not add extra fields or change field names. Ensure numbers are formatted correctly (e.g., 100.00, 0.00).
        ```json
        {
        "invoiceNumber": "INV-BOOKING-[Current Date YYYYMMDD]", // Generate based on date
        "issueDate": [YYYY, M, D], // Current Date
        "dueDate": [YYYY, M, D], // Current Date + 30 days
        "customerName": "[Extracted Customer Name]",
        "customerAddress": "[Extracted Customer Address]",
        "items": [
            {
            "description": "[Description of Booked Service]",
            "quantity": 1,
            "unitPrice": [Price of Service],
            "lineTotal": [Price of Service]
            }
        ],
        "subtotal": [Price of Service],
        "taxAmount": 0.00,
        "summary": "Invoice for booked [Service Name]",
        "totalAmount": [Price of Service]
        }
        ```
    
    [...]
    
    7.  ** Create an HTML string based on the example structure here **
    ```html
    <!DOCTYPE html>
    <html>
    <head>
    	<meta charset="UTF-8" />
    	<title>Seroter Roofing Invoice</title>
    	<style type="text/css">
    		body { font-family: sans-serif; margin: 20px; }
    		h1 { color: navy; }
    		.header, .customer-info, .summary-block, .footer { margin-bottom: 20px; }
    		.invoice-details { margin-top: 20px; padding: 10px; border: 1px solid #ccc; }
    		.invoice-details p { margin: 5px 0; }
    		table { width: 100%; border-collapse: collapse; margin-top: 20px; }
    		.summary-block { padding: 10px; border: 1px dashed #eee; background-color: #f9f9f9; }
    		th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
    		th { background-color: #f2f2f2; }
    		.text-right { text-align: right; }
    	</style>
    </head>
    <body>
    	<h1>Invoice</h1>
    
    	<div class="header">
    		<p><strong>Invoice Number:</strong>INV-001</p>
    		<p><strong>Date Issued:</strong>January 01, 2024</p>
    		<p><strong>Date Due:</strong>January 15, 2024</p>
    	</div>
    
    	<div class="customer-info">
    		<h2>Bill To:</h2>
    		<p>Customer Name</p>
    		<p>123 Customer Street, Denver, CO 80012</p>
    	</div>
    
    	<div class="summary-block">
    		<h2>Summary</h2>
    		<p>Details about the appointment and order...</p>
    	</div>
    
    	<table>
    		<thead>
    			<tr>
    				<th>Description</th>
    				<th>Quantity</th>
    				<th>Unit Price</th>
    				<th>Line Total</th>
    			</tr>
    		</thead>
    		<tbody>
    			<tr >
    				<td>Sample Item</td>
    				<td class="text-right">1</td>
    				<td class="text-right">10.00</td>
    				<td class="text-right">10.00</td>
    			</tr>
    		</tbody>
    	</table>
    
    	<div class="invoice-details">
    		<p class="text-right"><strong>Subtotal:</strong>>0.00</p>
    		<p class="text-right"><strong>Tax:</strong>0.00</p>
    		<p class="text-right"><strong>Total Amount:</strong> <strong>$123.45</strong></p>
    	</div>
    	<div class="footer">
    		<p>Thank you for your business!</p>
    	</div>
    </body>
    </html>
    ```
    

    Doing this “context engineering” well is important. Think through the instructions, data, and tools that you’re giving an agent to work with.

    Choice #7 – What’s the right approach to accessing Cloud services?

    My agent solution sent data to Pub/Sub (addressed above), but also relied on data sitting in a PostgreSQL database. And PDF blobs sitting in Cloud Storage.

    I had at least three implementation options here for PostgreSQL and Cloud Storage:

    • Function calling. Use functions that call the Cloud APIs directly, and leverage those functions as tools.
    • Model Context Protocol (MCP). Use MCP servers that act as API proxies for the LLM to use
    • YOLO mode. Ask the LLM to figure out the right API call to make for the given service.

    The last option works (mostly), but would be an absurd choice to make in 99.98% of situations.

    The appointment agent calls the Pub/Sub API directly by using that encompassing function as a tool. For the database access, I chose MCP. The MCP Toolbox for Databases is open source and fairly simple to use. It saves me from a lot of boilerplate database access code.

    private List<BaseTool> loadMcpTools(String mcpServerUrl) {
            try {
                SseServerParameters params = SseServerParameters.builder().url(mcpServerUrl).build();
                logger.info("Initializing MCP toolset with params: {}", params);
                McpToolset.McpToolsAndToolsetResult result = McpToolset.fromServer(params, new ObjectMapper()).get();
                if (result.getTools() != null && !result.getTools().isEmpty()) {
                    logger.info("MCP tools loaded: {}", result.getTools().size());
                    return result.getTools().stream().map(mcpTool -> (BaseTool) mcpTool).collect(Collectors.toList());
                }
            } catch (Exception e) {
                logger.error("Error initializing MCP toolset", e);
            }
            return new ArrayList<>();
        }
    

    When creating the PDF and adding it to Cloud Storage, I decided to use a robust function that I passed to the agent as a tool.

    private Map<String, Object> generatePdfFromHtmlInternal(String htmlContent) throws IOException {
            if (htmlContent == null || htmlContent.trim().isEmpty()) {
                throw new IllegalArgumentException("HTML content cannot be null or empty.");
            }
    
            try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
                ITextRenderer renderer = new ITextRenderer();
                renderer.setDocumentFromString(htmlContent);
                renderer.layout();
                renderer.createPDF(baos);
    
                String timestamp = LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMddHHmmssSSS"));
                String uniquePdfFilename = OUTPUT_PDF_FILENAME.replace(".pdf", "_" + timestamp + ".pdf");
                String bucketName = properties.getGcs().getBucketName();
    
                BlobId blobId = BlobId.of(bucketName, uniquePdfFilename);
                BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("application/pdf").build();
    
                storage.create(blobInfo, baos.toByteArray());
    
                String gcsPath = "gs://" + bucketName + "/" + uniquePdfFilename;
                logger.info("Successfully generated PDF and uploaded to GCS: {}", gcsPath);
                return Map.of("status", "success", "file_path", gcsPath);
    
            } catch (DocumentException e) {
                logger.error("Error during PDF document generation", e);
                throw new IOException("Error during PDF document generation: " + e.getMessage(), e);
            } catch (Exception e) {
                logger.error("Error during PDF generation or GCS upload", e);
                throw new IOException("Error during PDF generation or GCS upload: " + e.getMessage(), e);
            }
        }
    

    Choice #8 – How do I package up and run the agents?

    This choice may depend on who the agent is for (internal or external audiences), who has to support the agent, and how often you expect to update the agent.

    I chose to containerize the components so that I had maximum flexibility. I could have easily used the ADK CLI to deploy directly to Vertex AI Agent Engine—which comes with convenient features like memory management—but wanted more control than that. So I have Dockerfiles for each agent, and deploy them to Google Cloud Run. Here I get easy scale, tons of optional configurations, and I don’t pay for anything when the agent is dormant.

    In this case, I’m just treating the agent like any other type of code. You might make a different choice based on your use case.

    The final solution in action

    Let’s run this thing through. All the source code is sitting in my GitHub repo.

    I start by opening the the appointment agent hosted in Cloud Run. I’m using the built-in ADK web UI to have a conversational chat with the initial agent. I mention that I might have a leaky roof and want an inspection or repair. The agent then follows its instructions. After checking the weather in the city I’m in, it retrieves appointments via the API. On the left, there’s a handy set of tools to trace events, do evals, and more.

    At this point, I chose an available appointment, and the agent followed it’s next set of instructions. The appointment required two pieces of info (my name, and address), and wouldn’t proceed until I provided it. Once it had the data, it called the right function to make an appointment and publish a message to Pub/Sub.

    That data flowed through Google Cloud Pub/Sub, and got pushed to another agent hosted in Cloud Run.

    That agent immediately loaded up its MCP tools by calling the MCP server also hosted in Cloud Run. That server retrieved the list of offers for the city in question.

    This agent runs unattended in the background, so there’s no chat interface or interactivity. Instead, I can track progress by reading the log stream.

    When this agent got done converting the chat blob to JSON, then creating an HTML template, and calling the MCP tools to attach offers, it wrote the final PDF to Cloud Storage.

    There you go. It’s not perfect and I have improvements I want to make. Heck, the example here has the wrong date in the invoice, which didn’t happen before. So I need better instructions there. I’d like to switch the second agent from a push to a pull. It’d be fun to add some video or audio intake to the initial agent.

    Nobody knows the future, but it looks we’ll be building more agents, and fewer standalone apps. APIs matter more than ever, as do architectural decisions. Make good ones!

  • Daily Reading List – July 16, 2025 (#588)

    I planned my morning poorly with back-to-back intense presentations. I followed that up by staring into the middle distance for ten minutes to allow my brain to reboot. Do you give yourself a breather after some intense work, or do you push through?

    [blog] Engineering Deutsche Telekom’s sovereign data platform. Cool story of getting the power of cloud, without compromising on compliance requirements.

    [blog] Delegation is the AI Metric that Matters. Very interesting way to measure our acceptance of AI in our work and life.

    [blog] How to Use Open Source Without Losing Your Code, Users – or Sanity. It’s easy to remain ignorant of the right ways to open source, and use open source. Let’s all get smarter so we don’t get burned.

    [blog] Building production-ready generative AI: How Temporal supercharges Google’s Gemini and Veo. Long-running tasks are no joke. I like the way Temporal described the problem and solution here.

    [blog] Proven Practices for Succeeding with a Multicloud Strategy. I’ll take “blog titles you’d never see from AWS in 2021” for $1000, Alex. But here we are. Solid advice.

    [blog] 25 of my favorite ROI+ customer stories. I enjoy the quick demos I see from users on X, but these stories of actual generative AI success have more weight.

    [article] 5 ways generative AI projects fail. It’s only fair to also share stories where the AI projects fail to land. Here are some situations to avoid.

    [article] Bash 5.3 Has Some Big Improvements — Here’s How You Can Test It. If you could quickly recall the current version of Bash you’re using, then you’re a wizard. Bravo. I barely knew there was a current version. This ubiquitous Linux shell has some new features though.

    [blog] Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough. Lots of details in this post. Folks who use Kubernetes for LLM serving are going to like this.

    [article] Why LLMs demand a new approach to authorization. Indeed. I need to learn more about what needed, and possible, in this space.

    [article] AWS unveils Bedrock AgentCore, a new platform for building enterprise AI agents with open source frameworks and tools. You’ll see a lot of companies doing their best to own the agent control plane.

    [blog] Why development leaders are investing in design. Good engineering only gets you so far. If you don’t have the right product sense and design focus, your products rarely expand beyond hardcore users.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 15, 2025 (#587)

    A lot of AI stuff today. But also lots of insight about it, versus cheerleading. Dig in!

    [blog] Context Engineering: Bringing Engineering Discipline to Prompts. This post gave me greater appreciation for what “context engineering” is about. Don’t assume it’s just a silly new term for something we were already doing.

    [blog] AI from Google Cloud steps up to the plate at the MLB All-Star Game. The game’s tonight, and we’re adding a few cool experiences to the game.

    [blog] Starter GEMINI.md file within a project directory for Gemini CLI. Pay attention to these posts that help us write effective agent instructions that improve the output of our tools.

    [blog] The three great virtues of an AI-assisted programmer. What are you doing while waiting for your LLM to answer? How are you keeping control of the exercise? I liked Sean’s take here.

    [article] The Pragmatic Engineer 2025 Survey: What’s in your tech stack? Surveys are surveys, but there’s data here that might validate (or invalidate) some of your assumptions about today’s developers.

    [article] AI coding tools are shifting to a surprising place: the terminal. The survey above happened during April and May. I suspect the data would already look markedly different thanks to the rapid (and sustainable) rise in agentic terminal tools.

    [blog] The Internal Platform Scorecard: Speed, Safety, Efficiency, and Scalability. I like the attributes along with corresponding KPIs and metrics that matter. Know how you’re measuring the success of your platform!

    [blog] Gemini CLI Tutorial Series — Part 6 : More MCP Servers. Even more ways to use MCP servers in the Gemini CLI, including accessing Google Workspace and media services.

    [blog] Next-Level Data Automation: Gemini CLI, Google Sheets, and MCP. Interacting with Google Sheets, Docs, and Slides from an agentic CLI opens up a ton of possibilities.

    [blog] Tools: Code Is All You Need. A contrary opinion to the MCP hype. Maybe just use code instead?

    [blog] The AWS Survival Guide for 2025: A Field Manual for the Brave and the Bankrupt. Amusing. You don’t have to put up with all this if you want a better cloud alternative.

    [blog] AI Engineers and the Hot Vibe Code Summer. When I hear “AI engineer” I think of someone building AI models or doing ML work. Not a developer using AI tools. Kate looks at that term, and vibe coding to figure out what it all means.

    [article] Global IT spend keeps growing despite trade war concerns. Spending still up for the year, but trimmed expectations. El Reg has a more pessimistic take on the same data.

    [blog] The Fastest Way to Run the Gemini CLI: A Deep Dive into Package Managers. Super interesting content, especially if you care about performance.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below: