Author: Richard Seroter

  • Daily Reading List – July 22, 2025 (#592)

    Definitely seeing more “how this tech role is changed by AI” pieces lately. I’ve got two in today’s edition that look at architects and product managers. And some non-AI content for those of you burned out on the topic de jour.

    [blog] Gemini 2.5 Flash-Lite is now stable and generally available. Fast and low cost. What’s not to like? We’re also getting good at making these available across our surfaces at the same time.

    [article] Architecting the MVP in the Age of AI. Where does AI help architects? This article calls out a few areas where it can make a difference.

    [blog] My Product Management Toolkit (67): Using AI to write a PRD. Yes, you can use AI as a product manager, too. Here, creating a product requirements doc.

    [article] Moving from an orchestration-heavy to leadership-heavy management role. Are you supervising, managing, or leading? There’s a difference. Will talks about a move to leadership.

    [blog] Chainguard builds a market, everyone else wants in. Kudos to Chainguard for basically creating a new market for secure container images. Lots of other folks now playing in the space.

    [article] Employee curiosity fuels Shadow AI adoption faster than IT can keep up. You can’t stop AI usage, you can only hope to contain it.

    [article] Hackers exploiting SharePoint zero-day seen targeting government agencies. I admittedly haven’t thought about SharePoint in a long time, but I know many folks still depend on it. Get patched!

    [article] 6 Design Principles for Edge Computing Systems. Good lessons from some folks who really understand edge systems.

    [blog] Introducing OSS Rebuild: Open Source, Rebuilt to Last. This new project looks solid, and already has a way to use it to get insights into your supply chain.

    [blog] Coding with LLMs in the summer of 2025 (an update). Salvatore provides a current perspective on using AI coding assistants and agents. He also offers up some good advice for maximizing their value.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 21, 2025 (#591)

    I’m in New York and successfully navigated planes, trains, and walking 15 blocks. Lots of great AI content in my feed today, and I now gift that to you.

    [blog] Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad. That’s pretty darn remarkable. And soon, it’ll be in the hands of Google customers.

    [article] The Big LLM Architecture Comparison. Wow, fantastic post from Sebastian that shows us what’s new, relevant, and interesting with modern LLM architectures.

    [blog] Adding Tools to Your AI Agent — The Scalable Way. In my own blog post from last week, I showed how to use the “API tool” in an agent. It’s a great approach.

    [article] 5 key questions your developers should be asking about MCP. These are good questions to ask. Why use it? How do I secure it? What’s the long-term viability?

    [blog] Exploring the context of online images with Backstory. Trustworthiness is hard to come by given all this AI-generated content. Maybe this will help.

    [blog] Welcoming The Next Generation of Programmers. Great post. Don’t look down on vibe coders or those getting in the field. We’ve all started somewhere. Help them find a community so that they keep developing their engineering skills.

    [docs] Generate videos with Veo 3. What a great example of useful docs. Samples are tight, parameters are defined, and examples abound. If you’re doing text (or image) to video prompting, you can learn from this.

    [blog] Is AI Coming For Your Team Next? How AI Hype Is Becoming The Hot Layoff Excuse. For some teams and roles, I can understand pausing any new hiring until you see what the current team can do with AI assistance. I don’t yet see many cases where replacement makes sense.

    [blog] Application monitoring in Google Cloud: Bridging manual and AI-assisted troubleshooting. New paradigms are on the way. It’ll be interesting to see how folks mix manual setup and intervention with AI-driven issue resolution.

    [blog] Nobody Knows How To Build With AI Yet. Amusing and thought-provoking. I tell folks that the two main traits you need to exhibit nowadays are curiosity and humility. Don’t be over-confident, and just keep learning.

    [blog] Mastering Agentic Development with Gemini and Roo Code. We’re all just trying new tools, and picking up new techniques constantly. Roo Code looks cool.

    [article] Planning an Offsite for Your Leadership Team? Ask These 5 Questions. Are your leadership offsites kinda lame? It happens. I found these prompting questions interesting, and likely to stimulate the right type of chatter.

    [article] AI Agents Are Creating a New Security Nightmare for Enterprises and Startups. New tech and patterns are often disruptive. Just be smart about your day-2 considerations.

    [blog] Unlock Gemini’s reasoning: A step-by-step guide to logprobs on Vertex AI. Now you can retrieve some low-level details about probability choice of tokens returned by Gemini models.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 18, 2025 (#590)

    I had a good day. Next week I’m in New York City for some customer meetings and offsites. Expect the same rhythm of reading lists because this is now a habit!

    [blog] Vibe coding and the silent AI war inside tech companies. Good point here that the AI acceleration isn’t across all tasks. We’ll get some things done much faster, others, not so much.

    [blog] BigQuery’s New Cost Controls Are Here to Help. “Unlimited” is a bad default quota amount for a billable service. JK points out some new BigQuery default quotas that’ll help you avoid runaway bills.

    [blog] Giants awaken. Google Cloud GeminiCLI, AWS Kiro, developer experience and the need to ship and keep shipping. James at Redmonk looks at a couple of hyperscalers flexing their developer chops.

    [blog] Back to The Future: Evaluating AI Agents on Predicting Future Events. There’s no existing data to train on for future events, so can you use AI to predict what’s going to happen?

    [blog] MCP: Bringing mashups back! Love this. Agents and MCP make it simple to combine disparate APIs in new and funky ways.

    [article] Open Source Is Too Important To Dilute. Dan says that true open source is worth defending, and we can’t water down the definition with these not-really-open definitions.

    [blog] AI Agents, the New Frontier for LLMs. Folks on my team have been producing such exceptionally good content about AI. Not just the what, but the why and how.

    [blog] Building an automated GitLab Merge Request Review Agent with Gemini CLI. Good example of a background task (CI pipeline) that can make use of an agentic tool.

    [article] Anthropic tightens usage limits for Claude Code – without telling users. I feel for them. It’s hard to predict usage, it’s not cheap to serve, and competition is fierce.

    [blog] Cloud CISO Perspectives: Our Big Sleep agent makes a big leap, and other AI news. AI is foiling attempts to exploit vulnerabilities. That’s wild. Also get a roundup of security links.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 17, 2025 (#589)

    I bumped into some engineering folks at the office who are changing how they work with agentic CLIs. It was inspiring to see a real story about how our work is changing very dramatically.

    [article] The Founder’s Guide to Building a V1 of Customer Success. Good advice here, whether you’re setting up such a team for the first time, or rebooting a stagnant team.

    [blog] Proactiveness considered harmful? A guide to customise the Gemini CLI to suit your coding style. Fantastically good post, and an example of where open source is so powerful. Daniela wants to make the Gemini CLI less proactive, and explains how to steer and customize the model’s behavior.

    [article] Leading After Your Predecessor Fails. Did you take a job where you replaced someone else? Did they bomb out? Here’s guidance for how to repair the damage.

    [blog] Simplify your Agent “vibe building” flow with ADK and Gemini CLI. We’ll see more frameworks and products doing this, I’m certain. The ADK offers an llms-full.txt file that you can give your AI tool as context. This gives you the most relevant responses back.

    [blog] Where Technology Executives Will Be Investing In 2026. Maybe no surprises here, but a reminder that the APAC market has the highest rate of IT spending growth coming up.

    [blog] Five Big Improvements to Gradio MCP Servers. It seems that many people are using Gradio to expose MCP servers, and there are new improvements to auth along with other areas.

    [blog] How Renault Group is using Google’s software-defined vehicle industry solution. Cars nowadays are basically computers with wheels. Software matters a lot, and here’s a story of how one giant manufacturer is building for the future.

    [article] What can we learn from Meta’s code improvement practices? Short, but interesting look at research into how Meta scopes, prioritizes, and executes on code improvement projects.

    [blog] Why the analyst advisor industry is getting obliterated by AI… and how to save it. Shots fired! Feels spot on, and frankly applies to anyone in a role of “thought leadership.” Step up your game.

    [blog] Build with more flexibility: New open models arrive in the Vertex AI Model Garden. DeepSeek as a service joins models like Llama in our pay-as-you-go offering. Convenient when you don’t want to manage infra, or guess about capacity.

    [blog] Vibe Coding Is the Future of Programming. Here’s How Your Company Can Get on Board. Bold title. I don’t think vibe coding as currently defined is the future. But, orchestrating AI tools is.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Code was the least interesting part of my multi-agent app, and here’s what that means to me

    Code was the least interesting part of my multi-agent app, and here’s what that means to me

    At least 80% of the code I’ve ever written could have been written by AI, probably at higher quality. I’ve been “in tech” for twenty seven years and spent seven of those as a software developer. Even when I stopped getting paid for it, I never stopped coding. But little of it’s been truly novel; most of my code has been straightforward database access code, web APIs, presentation logic, and a handful of reasonably-complex systems. No doubt, many of you have done truly sophisticated things in code—compilers, performance-tuned algorithms, language frameworks—and AI isn’t replacing that any time soon. But I’d bet that much of the interesting tech work is moving away from raw code, and towards higher-order architecture.

    I wanted to build out an agentic solution, and I used AI to generate 90% of the code. That code isn’t where the unique value was at. None of it was particularly noteworthy. You can find the whole app here. The most interesting work related to architectural decisions. Here are eight choices I had to make, and I suspect you’ll have fun wrestling with the same ones.

    Choice #1 – What am I trying to accomplish and do agents make sense?

    My goal was to build an app that could take in a customer’s roofing needs, create a service appointment, and generate a personalized invoice for the work. I’m cheating here, since this exercise started as “Richard wants to learn some agent tech.” So I did start with the end in mind. Judge me accordingly.

    But in every legit situation, we start by evaluating the user need. What functional requirements do I need to satisfy? What performance or quality attributes are necessary? Can I solve this with a simple service, or modular monolith? Is the user flow deterministic or variable?

    This scenario could certainly be solved by a simple data collection form and PDF generator. What requirements might make an agentic architecture the truly correct choice?

    • Data collection from the user requires image, video, and audio input to best scope the services and pricing we should offer.
    • The scheduling or invoicing process requires a dynamic workflow based on a variety of factors, and hard-coding all the conditions would be tricky.

    Either way, this is always a critical choice before you write a single line of code.

    Choice #2 – What data or services are available to work with?

    Before we build anything new, what do we already have at our disposal?

    In my case, let’s assume I already have an appointments web API for retrieving available appointment times and making new appointments. I’ve also got an existing database that stores promotional offers that I want to conditionally add to my customer invoice. And I’ve got an existing Cloud Storage bucket where I store customer invoice PDFs.

    It’s easy to just jump into the application build, but pause for a few moments and take stock of your existing inventory and what you can build around.

    Choice #3 – What (agent) framework should I use and why?

    So. Many. Choices.

    There’s AI app frameworks like Genkit, LlamaIndex, and Spring AI. There are agent frameworks like LangChain, LangGraph, Autogen, CrewAI, and more. Google recently shipped the Agent Development Kit, available for Python and Java developers. An agent built with something like ADK is basically made up of three things: a model, instructions, and tools. ADK adds sweeteners that give you a lot of flexibility. Things I like about ADK:

    And look, I like it because my employer invests in it. So, that’s a big factor. I also wanted to build agents in both Python and Java, and this made ADK a great choice.

    Don’t get married to any framework, but learn the fundamentals of tool use, memory management, and agent patterns.

    Choice #4 – How should I use tools in the appointment agent?

    I suspect that tool selection will be a fascinating area for many builders in the years ahead. In this scenario, I had some decisions to make.

    I don’t want to book any roof repairs on rainy days. But where can I get the weather forecast from? I chose the built-in Google Search tool instead of trying to find some weather API on the internet.

    weather_agent = Agent(
        name="weather_agent",
        model="gemini-2.0-flash",
        description=(
            "Agent answers questions about the current and future weather in any city"
        ),
        instruction=(
            "You are an agent for Seroter Roofing. You can answer user questions about the weather in their city right now or in the near future"
    ),
        tools=[google_search],
    )
    

    For interacting with my existing appointments API, what’s the right tool choice? Using the OpenAPI tool baked into the ADK, I can just hand the agent an OpenAPI spec and it’ll figure out the right functions to call. For retrieving open appointment times, that’s a straightforward choice.

    openapi_spec = openapi_spec_template.replace("{API_BASE_URL}", config.API_BASE_URL)
    
    toolset = OpenAPIToolset(spec_str=openapi_spec, spec_str_type="json")
    api_tool_get_appointments = toolset.get_tool("get_available_appointments")
    

    But what about booking appointments? While that’s also an API operation, I want to piggyback a successful booking with a message to Google Cloud Pub/Sub that downstream subscribers can read from. That’s not part of the appointments API (nor should it be). Instead, I think a function tool makes sense here, where I manually invoke the appointments API, and then make as subsequent call to Pub/Sub.

    def add_appointment(customer: str, slotid: str, address: str, services: List[str], tool_context: ToolContext) -> dict:
        """Adds a roofing appointment by calling the booking API and logs the conversation history.
    
        This function serves as a tool for the agent. It orchestrates the booking process by:
        1. Calling the internal `_book_appointment_api_call` function to make the actual API request.
        2. If the booking is successful, it retrieves the conversation history from the
           `tool_context` and logs it to a Pub/Sub topic via `_log_history_to_pubsub`.
    
        Args:
            customer: The name of the customer.
            slotid: The ID of the appointment slot to book.
            address: The full address for the appointment.
            services: A list of services to be booked for the appointment.
            tool_context: The context provided by the ADK, containing session information.
    
        Returns:
            A dictionary containing the booking confirmation details from the API,
            or an error dictionary if the booking failed.
        """
        booking_response = _book_appointment_api_call(customer, slotid, address, services)
    
        if "error" not in booking_response:
            history_list: List[Event] = tool_context._invocation_context.session.events # type: ignore
            _log_history_to_pubsub(history_list)
        
        return booking_response
    

    Choice #5 – When/how do I separate agent boundaries?

    There’s a good chance that an agentic app has more than one agent. Stuffing everything into a single agent with a complex prompt and a dozen tools seems … suboptimal.

    But multi-agent doesn’t have to mean you’re sliding into a distributed system. You can include multiple agents in the same process space and deployment artifact. The Sequential Agent pattern in the ADK makes it simple to define distinct agents that run one and at time. So it seems wise to think of service boundaries for your agents, and only make a hard split when the context changes.

    For me, that meant one set of agents handling all the appointment stuff, and another distinct set of agents that worked on invoices. These don’t depend on each other, and should run separately. Both sets of agents use the Sequential Agent pattern.

    The appointment agent has sub-agents to look up the weather, and uses that agent as a tool within the primary root agent.

    The invoicing agent is more complex with sub-agents to build up HTML out of the chat history, another agent that looks up the best promotional offers to attach to the invoice, and a final agent that generates a PDF.

    private SequentialAgent createInvoiceAgent(
                PdfTool pdfTool,
                String mcpServerUrl,
                Resource htmlGeneratorPrompt,
                Resource bestOfferPrompt,
                Resource pdfWriterPrompt
        ) {
            String modelName = properties.getAgent().getModelName();
    
            LlmAgent htmlGeneratorAgent = LlmAgent.builder().model(modelName).name("htmlGeneratorAgent").description("Generates an HTML invoice from conversation data.").instruction(resourceToString(htmlGeneratorPrompt)).outputKey("invoicehtml").build();
    
            List<BaseTool> mcpTools = loadMcpTools(mcpServerUrl);
    
            LlmAgent bestOfferAgent = LlmAgent.builder().model(modelName).name("bestOfferAgent").description("Applies the best offers available to the invoice").instruction(resourceToString(bestOfferPrompt)).tools(mcpTools).outputKey("bestinvoicehtml").build();
    
            FunctionTool generatePdfTool = FunctionTool.create(PdfTool.class, "generatePdfFromHtml");
    
            LlmAgent pdfWriterAgent = LlmAgent.builder().model(modelName).name("pdfWriterAgent").description("Creates a PDF from HTML and saves it to cloud storage.").instruction(resourceToString(pdfWriterPrompt)).tools(List.of(generatePdfTool)).build();
    
            return SequentialAgent.builder().name(properties.getAgent().getAppName()).description("Execute the complete sequence to generate, improve, and publish an PDF invoice to Google Cloud Storage.").subAgents(htmlGeneratorAgent, bestOfferAgent, pdfWriterAgent).build();
        }
    

    How should I connect these agents? I didn’t want hard-coded links between the services, as they can operate async and independently. You could imagine other services being interested in a booking too. So I put Google Cloud Pub/Sub in the middle. I used a push notification (to the invoice agent’s HTTP endpoint), but I’ll probably refactor it and make it a pull subscription that listens for work.

    Choice #6 – What’s needed in my agent instructions?

    I’m getting better at this. Still not great. But I’m using AI to help me, and learning more about what constraints and direction make the biggest impact.

    For the booking agent, my goal was to collect all the data needed, while factoring in constraints such as weather. My agent instructions here included core principles, operational steps, the must-have data to collect, which decisions to make, and how to use the available tools.

    root_agent = Agent(
        name="root_agent",
        model="gemini-2.5-flash",
        description="This is the starting agent for Seroter Roofing and customers who want to book a roofing appointment",
        instruction=(
            """
    You are an AI agent specialized in booking roofing appointments. Your primary goal is to find available appointments for roofing services, and preferably on days where the weather forecast predicts dry weather.
    
    ## Core Principles:
    
        *   **Information First:** You must gather the necessary information from the user *before* attempting to use any tools.
        *   **Logical Flow:** Follow the steps outlined below strictly.
        *   **Professional & Helpful:** Maintain a polite, professional, and helpful tone throughout the interaction.
    
    ## Operational Steps:
    
    1.  **Greeting:**
        *   Start by politely greeting the user and stating your purpose (booking roofing appointments).
        *   *Example:* "Hello! I can help you book a roofing appointment. What kind of service are you looking for today?"
    
    2.  **Information Gathering:**
        *   You need two key pieces of information from the user:
            *   **Type of Service:** What kind of roofing service is needed? (e.g., repair, replacement, inspection, estimate)
            *   **Service Location:** What city is the service required in?
        *   Ask for this information clearly if the user doesn't provide it upfront. You *cannot* proceed to tool usage until you have both the service type and the city.
        *   *Example follow-up:* "Great, and in which city is the property located?"
    
    3.  **Tool Usage - Step 1: Check Appointment Availability (Filtered):**
        *   Get information about available appointment times:
        *   **[Use Tool: Appointment availability]** for the specified city.
        *   **Crucially:** When processing the results from the appointment tool, **filter** the available appointments to show *only* those that fall on the specific dates without rain in the forecast. You should also consider the service type if the booking tool supports filtering by type.
    
    4.  **Tool Usage - Step 2: Check Weather Forecast:**
        *   Once you have the service type and city, your next action is to check the weather.
        *   **[Use Tool: 7-day weather forecast]** for the specified city.
        *   Analyze the forecast data returned by the tool. Identify which days within the next 7 days are predicted to be 'sunny' or at least dry. Be specific about what constitutes 'dry' based on the tool's output.
    
    5.  **Decision Point 1: Are there Appointments on Dry Days?**
        *   If the appointment availability tool returns available slots *specifically* on the identified dry days:
            *   Present these available options clearly to the user, including the date, time, and potentially the service type (if applicable).
            *   Explain that these options meet the dry weather preference.
            *   Prompt the user to choose an option to book.
            *   *Example:* "Great news! The forecast for [City] shows dry weather on [Date 1], [Date 2], etc. I've checked our schedule and found these available appointments on those days: [List appointments]."
    
        *   If the appointment availability tool returns slots, but *none* of them fall on the identified sunny days (or if the tool returns no slots at all):
            *   Inform the user that while there are dry days coming up, there are currently no appointments available on those specific dry dates within the next 7 days.
            *   Explain that your search was limited to the dry days based on the forecast.
            *   Suggest they might want to try a different service type (if relevant) or check back later as availability changes.
            *   *Example:* "While the forecast for [City] does show some dry days coming up, I wasn't able to find any available appointments specifically on those dates within the next week. Our schedule on sunny days is quite popular. Please try again in a few days, as availability changes, or let me know if you need a different type of service."
    
    6.  **Confirmation/Booking (If Applicable):**
        *   Be sure to get the full name and full address of the location for the appointment.
             
    **Tools**
        You have access to the following tools to assist you:
        `weather_agent`: use this tool to find the upcoming weather forecast and identify rainy days
        `api_tool_get_appointments -> json`: use this OpenAPI tool to answer any questions about available appointments
        `add_appointment(customer: str, slotid: str, address: str, services: List[str]) -> dict`: use this tool to add a new appointment
    """
        ),
        tools=[agent_tool.AgentTool(weather_agent), api_tool_get_appointments, tools.add_appointment],
    )
    

    The invoicing agent had a more complex prompt as I wanted to shape the blob of chat history into a structured JSON and then into valid HTML. Of course, I could have (should have?) structured the raw data before it left the original agent, but I wanted try it this way. My agent instructions show an example of the preferred JSON, and also the valid HTML structure.

    **Role:** You are a specialized agent designed to generate an HTML invoice from a successful appointment booking history.
    
    **Task:** Process the entire user prompt, which contains conversation history in a JSON format. Your goal is to create a complete HTML invoice based on the details found in that JSON.
    
    [...]
    
    4.  **Invoice JSON Structure:** The JSON invoice you internally generate **must** strictly adhere to the format provided in the example below. Do not add extra fields or change field names. Ensure numbers are formatted correctly (e.g., 100.00, 0.00).
        ```json
        {
        "invoiceNumber": "INV-BOOKING-[Current Date YYYYMMDD]", // Generate based on date
        "issueDate": [YYYY, M, D], // Current Date
        "dueDate": [YYYY, M, D], // Current Date + 30 days
        "customerName": "[Extracted Customer Name]",
        "customerAddress": "[Extracted Customer Address]",
        "items": [
            {
            "description": "[Description of Booked Service]",
            "quantity": 1,
            "unitPrice": [Price of Service],
            "lineTotal": [Price of Service]
            }
        ],
        "subtotal": [Price of Service],
        "taxAmount": 0.00,
        "summary": "Invoice for booked [Service Name]",
        "totalAmount": [Price of Service]
        }
        ```
    
    [...]
    
    7.  ** Create an HTML string based on the example structure here **
    ```html
    <!DOCTYPE html>
    <html>
    <head>
    	<meta charset="UTF-8" />
    	<title>Seroter Roofing Invoice</title>
    	<style type="text/css">
    		body { font-family: sans-serif; margin: 20px; }
    		h1 { color: navy; }
    		.header, .customer-info, .summary-block, .footer { margin-bottom: 20px; }
    		.invoice-details { margin-top: 20px; padding: 10px; border: 1px solid #ccc; }
    		.invoice-details p { margin: 5px 0; }
    		table { width: 100%; border-collapse: collapse; margin-top: 20px; }
    		.summary-block { padding: 10px; border: 1px dashed #eee; background-color: #f9f9f9; }
    		th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
    		th { background-color: #f2f2f2; }
    		.text-right { text-align: right; }
    	</style>
    </head>
    <body>
    	<h1>Invoice</h1>
    
    	<div class="header">
    		<p><strong>Invoice Number:</strong>INV-001</p>
    		<p><strong>Date Issued:</strong>January 01, 2024</p>
    		<p><strong>Date Due:</strong>January 15, 2024</p>
    	</div>
    
    	<div class="customer-info">
    		<h2>Bill To:</h2>
    		<p>Customer Name</p>
    		<p>123 Customer Street, Denver, CO 80012</p>
    	</div>
    
    	<div class="summary-block">
    		<h2>Summary</h2>
    		<p>Details about the appointment and order...</p>
    	</div>
    
    	<table>
    		<thead>
    			<tr>
    				<th>Description</th>
    				<th>Quantity</th>
    				<th>Unit Price</th>
    				<th>Line Total</th>
    			</tr>
    		</thead>
    		<tbody>
    			<tr >
    				<td>Sample Item</td>
    				<td class="text-right">1</td>
    				<td class="text-right">10.00</td>
    				<td class="text-right">10.00</td>
    			</tr>
    		</tbody>
    	</table>
    
    	<div class="invoice-details">
    		<p class="text-right"><strong>Subtotal:</strong>>0.00</p>
    		<p class="text-right"><strong>Tax:</strong>0.00</p>
    		<p class="text-right"><strong>Total Amount:</strong> <strong>$123.45</strong></p>
    	</div>
    	<div class="footer">
    		<p>Thank you for your business!</p>
    	</div>
    </body>
    </html>
    ```
    

    Doing this “context engineering” well is important. Think through the instructions, data, and tools that you’re giving an agent to work with.

    Choice #7 – What’s the right approach to accessing Cloud services?

    My agent solution sent data to Pub/Sub (addressed above), but also relied on data sitting in a PostgreSQL database. And PDF blobs sitting in Cloud Storage.

    I had at least three implementation options here for PostgreSQL and Cloud Storage:

    • Function calling. Use functions that call the Cloud APIs directly, and leverage those functions as tools.
    • Model Context Protocol (MCP). Use MCP servers that act as API proxies for the LLM to use
    • YOLO mode. Ask the LLM to figure out the right API call to make for the given service.

    The last option works (mostly), but would be an absurd choice to make in 99.98% of situations.

    The appointment agent calls the Pub/Sub API directly by using that encompassing function as a tool. For the database access, I chose MCP. The MCP Toolbox for Databases is open source and fairly simple to use. It saves me from a lot of boilerplate database access code.

    private List<BaseTool> loadMcpTools(String mcpServerUrl) {
            try {
                SseServerParameters params = SseServerParameters.builder().url(mcpServerUrl).build();
                logger.info("Initializing MCP toolset with params: {}", params);
                McpToolset.McpToolsAndToolsetResult result = McpToolset.fromServer(params, new ObjectMapper()).get();
                if (result.getTools() != null && !result.getTools().isEmpty()) {
                    logger.info("MCP tools loaded: {}", result.getTools().size());
                    return result.getTools().stream().map(mcpTool -> (BaseTool) mcpTool).collect(Collectors.toList());
                }
            } catch (Exception e) {
                logger.error("Error initializing MCP toolset", e);
            }
            return new ArrayList<>();
        }
    

    When creating the PDF and adding it to Cloud Storage, I decided to use a robust function that I passed to the agent as a tool.

    private Map<String, Object> generatePdfFromHtmlInternal(String htmlContent) throws IOException {
            if (htmlContent == null || htmlContent.trim().isEmpty()) {
                throw new IllegalArgumentException("HTML content cannot be null or empty.");
            }
    
            try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
                ITextRenderer renderer = new ITextRenderer();
                renderer.setDocumentFromString(htmlContent);
                renderer.layout();
                renderer.createPDF(baos);
    
                String timestamp = LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMddHHmmssSSS"));
                String uniquePdfFilename = OUTPUT_PDF_FILENAME.replace(".pdf", "_" + timestamp + ".pdf");
                String bucketName = properties.getGcs().getBucketName();
    
                BlobId blobId = BlobId.of(bucketName, uniquePdfFilename);
                BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("application/pdf").build();
    
                storage.create(blobInfo, baos.toByteArray());
    
                String gcsPath = "gs://" + bucketName + "/" + uniquePdfFilename;
                logger.info("Successfully generated PDF and uploaded to GCS: {}", gcsPath);
                return Map.of("status", "success", "file_path", gcsPath);
    
            } catch (DocumentException e) {
                logger.error("Error during PDF document generation", e);
                throw new IOException("Error during PDF document generation: " + e.getMessage(), e);
            } catch (Exception e) {
                logger.error("Error during PDF generation or GCS upload", e);
                throw new IOException("Error during PDF generation or GCS upload: " + e.getMessage(), e);
            }
        }
    

    Choice #8 – How do I package up and run the agents?

    This choice may depend on who the agent is for (internal or external audiences), who has to support the agent, and how often you expect to update the agent.

    I chose to containerize the components so that I had maximum flexibility. I could have easily used the ADK CLI to deploy directly to Vertex AI Agent Engine—which comes with convenient features like memory management—but wanted more control than that. So I have Dockerfiles for each agent, and deploy them to Google Cloud Run. Here I get easy scale, tons of optional configurations, and I don’t pay for anything when the agent is dormant.

    In this case, I’m just treating the agent like any other type of code. You might make a different choice based on your use case.

    The final solution in action

    Let’s run this thing through. All the source code is sitting in my GitHub repo.

    I start by opening the the appointment agent hosted in Cloud Run. I’m using the built-in ADK web UI to have a conversational chat with the initial agent. I mention that I might have a leaky roof and want an inspection or repair. The agent then follows its instructions. After checking the weather in the city I’m in, it retrieves appointments via the API. On the left, there’s a handy set of tools to trace events, do evals, and more.

    At this point, I chose an available appointment, and the agent followed it’s next set of instructions. The appointment required two pieces of info (my name, and address), and wouldn’t proceed until I provided it. Once it had the data, it called the right function to make an appointment and publish a message to Pub/Sub.

    That data flowed through Google Cloud Pub/Sub, and got pushed to another agent hosted in Cloud Run.

    That agent immediately loaded up its MCP tools by calling the MCP server also hosted in Cloud Run. That server retrieved the list of offers for the city in question.

    This agent runs unattended in the background, so there’s no chat interface or interactivity. Instead, I can track progress by reading the log stream.

    When this agent got done converting the chat blob to JSON, then creating an HTML template, and calling the MCP tools to attach offers, it wrote the final PDF to Cloud Storage.

    There you go. It’s not perfect and I have improvements I want to make. Heck, the example here has the wrong date in the invoice, which didn’t happen before. So I need better instructions there. I’d like to switch the second agent from a push to a pull. It’d be fun to add some video or audio intake to the initial agent.

    Nobody knows the future, but it looks we’ll be building more agents, and fewer standalone apps. APIs matter more than ever, as do architectural decisions. Make good ones!

  • Daily Reading List – July 16, 2025 (#588)

    I planned my morning poorly with back-to-back intense presentations. I followed that up by staring into the middle distance for ten minutes to allow my brain to reboot. Do you give yourself a breather after some intense work, or do you push through?

    [blog] Engineering Deutsche Telekom’s sovereign data platform. Cool story of getting the power of cloud, without compromising on compliance requirements.

    [blog] Delegation is the AI Metric that Matters. Very interesting way to measure our acceptance of AI in our work and life.

    [blog] How to Use Open Source Without Losing Your Code, Users – or Sanity. It’s easy to remain ignorant of the right ways to open source, and use open source. Let’s all get smarter so we don’t get burned.

    [blog] Building production-ready generative AI: How Temporal supercharges Google’s Gemini and Veo. Long-running tasks are no joke. I like the way Temporal described the problem and solution here.

    [blog] Proven Practices for Succeeding with a Multicloud Strategy. I’ll take “blog titles you’d never see from AWS in 2021” for $1000, Alex. But here we are. Solid advice.

    [blog] 25 of my favorite ROI+ customer stories. I enjoy the quick demos I see from users on X, but these stories of actual generative AI success have more weight.

    [article] 5 ways generative AI projects fail. It’s only fair to also share stories where the AI projects fail to land. Here are some situations to avoid.

    [article] Bash 5.3 Has Some Big Improvements — Here’s How You Can Test It. If you could quickly recall the current version of Bash you’re using, then you’re a wizard. Bravo. I barely knew there was a current version. This ubiquitous Linux shell has some new features though.

    [blog] Implementing High-Performance LLM Serving on GKE: An Inference Gateway Walkthrough. Lots of details in this post. Folks who use Kubernetes for LLM serving are going to like this.

    [article] Why LLMs demand a new approach to authorization. Indeed. I need to learn more about what needed, and possible, in this space.

    [article] AWS unveils Bedrock AgentCore, a new platform for building enterprise AI agents with open source frameworks and tools. You’ll see a lot of companies doing their best to own the agent control plane.

    [blog] Why development leaders are investing in design. Good engineering only gets you so far. If you don’t have the right product sense and design focus, your products rarely expand beyond hardcore users.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 15, 2025 (#587)

    A lot of AI stuff today. But also lots of insight about it, versus cheerleading. Dig in!

    [blog] Context Engineering: Bringing Engineering Discipline to Prompts. This post gave me greater appreciation for what “context engineering” is about. Don’t assume it’s just a silly new term for something we were already doing.

    [blog] AI from Google Cloud steps up to the plate at the MLB All-Star Game. The game’s tonight, and we’re adding a few cool experiences to the game.

    [blog] Starter GEMINI.md file within a project directory for Gemini CLI. Pay attention to these posts that help us write effective agent instructions that improve the output of our tools.

    [blog] The three great virtues of an AI-assisted programmer. What are you doing while waiting for your LLM to answer? How are you keeping control of the exercise? I liked Sean’s take here.

    [article] The Pragmatic Engineer 2025 Survey: What’s in your tech stack? Surveys are surveys, but there’s data here that might validate (or invalidate) some of your assumptions about today’s developers.

    [article] AI coding tools are shifting to a surprising place: the terminal. The survey above happened during April and May. I suspect the data would already look markedly different thanks to the rapid (and sustainable) rise in agentic terminal tools.

    [blog] The Internal Platform Scorecard: Speed, Safety, Efficiency, and Scalability. I like the attributes along with corresponding KPIs and metrics that matter. Know how you’re measuring the success of your platform!

    [blog] Gemini CLI Tutorial Series — Part 6 : More MCP Servers. Even more ways to use MCP servers in the Gemini CLI, including accessing Google Workspace and media services.

    [blog] Next-Level Data Automation: Gemini CLI, Google Sheets, and MCP. Interacting with Google Sheets, Docs, and Slides from an agentic CLI opens up a ton of possibilities.

    [blog] Tools: Code Is All You Need. A contrary opinion to the MCP hype. Maybe just use code instead?

    [blog] The AWS Survival Guide for 2025: A Field Manual for the Brave and the Bankrupt. Amusing. You don’t have to put up with all this if you want a better cloud alternative.

    [blog] AI Engineers and the Hot Vibe Code Summer. When I hear “AI engineer” I think of someone building AI models or doing ML work. Not a developer using AI tools. Kate looks at that term, and vibe coding to figure out what it all means.

    [article] Global IT spend keeps growing despite trade war concerns. Spending still up for the year, but trimmed expectations. El Reg has a more pessimistic take on the same data.

    [blog] The Fastest Way to Run the Gemini CLI: A Deep Dive into Package Managers. Super interesting content, especially if you care about performance.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 14, 2025 (#586)

    I liked some of the advice and data in today’s reading list. Learn about taking care of yourself as a leader, generating media from agentic CLIs, embracing personal growth, and building better distributed systems.

    [blog] The 2025 Docker State of Application Development Report. Tons of interesting data points in this survey. See what devs think of AI, remote dev environments, containers, and places to learn.

    [article] Leading Is Emotionally Draining. Here’s How to Recover. I get that most individual contributors won’t spend a lot of time sympathizing with the challenges of leadership. But it’s no joke, and a heavy investment in leadership is more work that it appears.

    [blog] Gemini CLI Tutorial Series — Part 5 : Github MCP Server. I go hot and cold on MCP. However, this is a good example of where it’s handy for agentic CLI work.

    [blog] Give Gemini CLI the Ability to Generate Images and Video, Work with GitHub Repos, and Use Other Tools. Dammit, now I’m hot again on MCP. Here’s another scenario where an MCP brings functionality to the Gemini CLI that wouldn’t have otherwise been there.

    [blog] How to Shift What You Think Is Possible. Growth looks different to beginners than experts. But both need a way to rethink what’s possible.

    [blog] Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. On Friday, I linked to a paper that showed a slowdown when devs used AI tools. Simon explores this further.

    [blog] The GitOps Repository Structure: Monorepo vs. Polyrepo and Best Practices. Are you a monorepo person who likes everything in one place? Or is your source code and config data distributed across many repos? Here is advice for both.

    [blog] Gemini Embedding now generally available in the Gemini API. Generate embeddings with this high performing and competitively priced model.

    [blog] Distributed Systems Mistakes Nobody Warns You About: Consistency. Yes. It’s easy to assume that all the related operations succeeded, but you get into a funky state when you have a a bunch of async updates and one fails.

    [blog] What You Actually Need to Monitor AI Systems in Production. Good topic, and not one I’ve seen widely discussed. What should you be keeping track of at each app stage?

    [article] Building Autonomous Systems: A Guide to Agentic AI Workflows. This DigitalOcean article has a fairly rich look at the use cases, patterns, and technologies around agents.

    [blog] How to Teach Gemini CLI to Write Python Scripts with Inline Dependencies. I think we’re selling ourselves short on our role with these AI tools. There are so many cases where our creativity is applied to “teach” these tools to do cool things.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 11, 2025 (#585)

    Good end to the week. I appreciate having co-workers that I can just call up and strategize with. Or complain too. Or celebrate with. All at the same time.

    [youtube-video] The Agent Factory – Episode 1: Agents, their frameworks and when to use them. This new podcast/video series has promise. It features some colleagues who did a good job looking at the overall landscape, and zeroing in on some Google innovations.

    [article] Stop forcing AI tools on your engineers. Very good advice here. Definite give teams time to explore, give space, and know what matters. Also, don’t wait forever.

    [blog] Beyond GROUP BY: Introducing advanced aggregation functions in BigQuery. This seems like a good deal for data folks. The post calls out some big performance and efficiency benefits of these new aggregation functions.

    [blog] Our new approach to enterprise CI/CD: Free tier, source available, and guaranteed savings. There’s still interesting work happening in the CI/CD space. The Semaphore folks are making some moves.

    [blog] User Count: One. Bespoke Software with Gemini CLI. You can just build software for yourself. Billy wanted help focusing, so built a Chrome extension for an audience of one.

    [article] AI coding tools may not speed up every developer, study shows. I’m not just going to read (and share) posts that say everything is sunshine. We’re still all figuring out where these AI tools add value, and when other fundamentals need to be in place for them to be useful.

    [article] Ollama or vLLM? How to choose the right LLM serving tool for your use case. “When to use what” is such an important question in so many areas nowadays. Here’s help with LLM serving tools.

    [blog] Google Brings the Lustre Parallel File System to Its Cloud. Teams are trying to squeeze out every bit of performance on their ML jobs, and this should help them do that, with less management.

    [blog] SQL reimagined: How pipe syntax is powering real-world use cases. You should give this a whirl. In reading the example queries, it does seem to have some real benefits over standard SQL.

    [blog] Graph foundation models for relational data. This seems like some creative thinking around ML algorithms and training data.

    [blog] How to use GenAI as an Executive Assistant. Lak wrote a series of seven posts (so far) about different roles that you can apply generative AI to. Check out his other posts about engineering, research, analyst, and more.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – July 10, 2025 (#584)

    I’m officially having a hard time keeping up with this industry. My feed reader is constantly overflowing, the social feeds never stop, and vendors keep shipping interesting things. I’ll likely compensate by narrowing my attention a bit, but wow. So much happening.

    [blog] Docker Brings Compose to the Agent Era: Building AI Agents is Now Easy. Very cool integrations for AI frameworks, and publishing Compose specs to Google Cloud Run in one command.

    [blog] From localhost to launch: Simplify AI app deployment with Cloud Run and Docker Compose. Our deeper dive into Docker’s announcement.

    [blog] How I use LLMs to learn new subjects. This seems like a reasonable way of looking at it. I liked the points about hallucinations.

    [article] Survey Surfaces Significant Lack of Visibility Into Software Supply Chain Risks. It seems that a lot of folks don’t think they can see supply chain problems. Can you?

    [docs] GraphRAG infrastructure for generative AI using Vertex AI and Spanner Graph. Here’s a terrific new guide for those doing a graph-based approach to retrieval augmented generation.

    [article] How to Communicate with Your Team When Business Is Bad. We all react differently when a crisis hits. This post encourage a wise approach to handling this within your team.

    [article] Shadow AI emerges in the enterprise. This is the least surprising headline of the year. 80% say employee AI tool adoption is outpacing the capacity for IT teams to vet the apps. And the other 20% are lying about it.

    [article] How to Measure the ROI of AI Coding Assistants. Great asset here for team leaders that want to figure out what really matters in their dev experience.

    [article] Introducing the AI Measurement Framework. Here’s more detail on the framework called out in the preceding article. Looks useful!

    [blog] Cloud Storage bucket relocation: An industry first for non-disruptive bucket migrations. It’s not easy to move an object storage bucket to a different region. At least if you’re in any cloud besides Google’s.

    [blog] Navigating the Mythical Sea of Sameness. I thought this was educational for those of us who communicate to others. Talking about differentiation can sometimes feel like bragging, but it’s important to help people understand what your company/product is uniquely doing.

    [blog] What can agents actually do? Practical look at what AI agents are about, how to think about them, and what they’re capable of.

    [blog] BigQuery meets ADK: 10 tips to safeguard your data (and wallet) from agents. We need more content about how to protect key assets in an environment with agents running loose.

    [blog] A new era of Stack Overflow. I seem to recall a few of these resets from Stack Overflow lately, but I appreciate that they’re actively thinking about how to stay relevant to devs.

    [article] Elon Musk’s xAI launches Grok 4 alongside a $300 monthly subscription. The bar keeps going up, and Grok 4 has some impressive performance. Some testing already from Simon.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below: