Author: Richard Seroter

  • Daily Reading List – March 18, 2026 (#744)

    Today’s list definitely has some assertive opinions. What’s the new baseline for performance? How are you thinking about MCP wrong? What’s going to happen when you’ve lost comprehension of your codebase? We should ask hard questions and noodle on the answers.

    [blog] 10x is the new floor. As our tools get better, the floor goes up. More is expected. Being “just ok” at your job is a fairly risky proposition in 2026.

    [blog] Introducing “vibe design” with Stitch. This is such a game-changer for UX folks, but also everyone else who wants to bring smart design into their apps.

    [article] How coding agents work. Another good one from Simon that explains the agentic loops and techniques you find in coding agents.

    [blog] Gemini API tooling updates: context circulation, tool combos and Maps grounding for Gemini 3. Good quality of life update for people buildings AI apps and agents.

    [blog] Our latest investment in open source security for the AI era. A handful of us are pitching in to ensure that open source stays stable and secure.

    [blog] MCP Isn’t Dead You Just Aren’t the Target Audience. Allen makes the important point that not every agent has a shell or is a coding assistant. For many agents, MCP is an important connector.

    [article] Agents write code. They don’t do software engineering. I mostly agree with this. Today. But the line keeps moving, and if you think only humans will do engineering, I think you’ll be left behind.

    [article] How Uber Engineers Use AI Agents. These engineers use AI for assigned work. Here are insights from a recent talk by one of their leaders.

    [article] OpenClaw can bypass your EDR, DLP and IAM without triggering a single alert. Yes, agents aren’t ready for unfettered access to everything to do anything. But that may not last long. NVIDIA is doing work around this.

    [blog] From Ideation to Automation: The Scoop on Outages. McDonalds gets grief for offline ice cream machines, but apparently there’s more going on than I thought. And, better solutions to get back online.

    [blog] TikTok reduces code size by 58% and improves app performance for new features with Jetpack Compose. The right framework can make a meaningful difference in performance and maintenance cost.

    [blog] Comprehension Debt – the hidden cost of AI generated code. It’s your job to understand your code and how your system works. Are you piling up comprehension debt, or building in the right discipline?

    [article] Markdown is now a first-class coding language: Deal with it. There are so many ways nowadays to start nerd fights. Saying this is one of them.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 17, 2026 (#743)

    I need the right mix of meetings in my workday for it to be a good day. Some quality 1:1 chats, a few decision-focused meetings are fine, and then seeing something that’s exciting for our users. Today was a good day.

    [blog] Bringing the power of Personal Intelligence to more people. You’ll now see this in Search, the Gemini app, and Gemini in Chrome.

    [blog] Subagents. Simon’s been adding to this series of posts about agentic engineering patterns, and this one on subagents is helpful.

    [blog] Giving you more transparency and control over your Gemini API costs. This is a harder problem than you might think. Glad to see this team giving developers spending caps and other tools to control cost.

    [article] Google Workspace’s New AI Features Seem Genuinely Useful. Nice to hear. We’re all shown a lot of AI tools and probably only use a few.

    [report] The State of AI in the Enterprise. I’m surprised that this Deloitte report is ungated. Check it out for some useful information about enterprise approaches to AI.

    [blog] Measuring progress toward AGI: A cognitive framework. This links to a paper where we look at 10 cognitive abilities and how you’d evaluate progress towards Artificial General Intelligence.

    [article] Banks struggle to scale AI as legacy tech devours IT budgets. Until you get some of the prereqs under control, it’s going to be hard to throw important dollars at AI work. But results need to be there too!

    [blog] Introducing multi-cluster GKE Inference Gateway: Scale AI workloads around the world. Run inference workloads across clusters, and even across regions.

    [blog] State of Open Source on Hugging Face: Spring 2026. A metric ton of data here from Hugging Face. Which open models are used where, who is contributing the most, and much more.

    [blog] Developer Guide: Nano Banana 2 with the Gemini Interactions API. It’s an underrated API and Philipp is inspiring me to make this a bigger part of my toolbox.

    [blog] Agent Protocols — MCP, A2A, A2UI, AG-UI. Get familiar with these, or at least the use cases they purport to help.

    [blog] Announcing the Colab MCP Server: Connect Any AI Agent to Google Colab. Wicked. Offload to the Colab host and use notebooks as tools thanks to this new open MCP server.

    [docs] Durable AI agent with Gemini and Temporal. Want to persist the steps of an agentic loop so that you can resume in any situation? That’s what Temporal does.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 16, 2026 (#742)

    I waded a bit into the “MCP or not” debate by running some experiments to see how much MCP costs my custom-built agent. If you complement with agent skills, the answer is “not too much.”

    [blog] Become Builders, Not Coders. This is more of a directive versus suggestion at this point. What has to change and how do you do it? Here’s a post with advice.

    [blog] Balancing AI tensions: Moving from AI adoption to effective SDLC use. The DORA team used some fresh research to understand how teams are using AI, where they get value, and stumble. The suggestions are very good.

    [blog] Why context is the missing link in AI data security. These Google Cloud tools are really impressive at identifying and masking sensitive info. Now, with better context classifiers.

    [blog] Run Karpathy’s autoresearch on a Google serverless stack for $2/hour. With the exception of doing massive training jobs, most of us can try out nearly anything with AI for a reasonable cost. I like Karl’s example here.

    [article] Why the World Still Runs on SAP. Big ERP, CRM, and service management platforms aren’t going anywhere. But it’s going to get easier to set them up, use them, and operate them.

    [article] You’re Not Paid to Write Code. I recognize that I’ve shared a lot of posts on this topic. But it’s important. We’re not just adding tools to the mix; we’re changing identities and habits. That takes repetitive reminders and motivation.

    [blog] When to use WebMCP and MCP. Pay attention to WebMCP. It might turn out to be something fairly important.

    [blog] BigQuery Studio is more useful than ever, with enhanced Gemini assistant. I like this surface, and it’s made data analytics so much simpler for experts and novices.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • My custom agent used 87% fewer tokens when I gave it Skills for its MCP tools

    Today’s web apps don’t seem particularly concerned about resource consumption. The simplest site seems to eat up hundreds of MB of memory in my browser. We’ve probably gotten a bit lazy with optimization since many computers have horsepower to spare. But when it comes to LLM tokens, we’re still judicious. Most of us have bumped into quotas or unexpected costs!

    I see many examples of introducing and tuning MCPs and skills for IDEs and agentic tools. But what about the agents you’re building? What’s the token impact of using MCPs and skills for custom agents?

    I tried out six solutions with the Agent Development Kit (Python) and counted my token consumption for each. The tl;dr? A well-prompted Gemini with zero tools or skills is successful with the fewest tokens consumed, with the second best option being MCP + skills. Third-best in token consumption is raw Gemini plus skills.

    I trust that you can find a thousand ways to do this better than me, but here’s a table with the best results from multiple runs of each of my experiments. The title of the post refers to the difference between scenarios 2 and 3.

    ScenarioAgent DescriptionTurnsTokens
    0Instructions only, built in code execution tool71,286
    1Uses BigQuery MCP913,763
    2Uses BigQuery, AlloyDB, Cloud SQL MCPs29328,083
    3Uses BigQuery, AlloyDB, Cloud SQL MCPs with skill539,622
    4Use BigQuery MCP and a skill56,653
    5Instruction, skill, and built-in code execution tool2764,444

    What’s the problem to solve?

    I want an agent that can do some basic cloud FinOps for me. I’ve got a Google Cloud BigQuery table that is automatically populated with billing data for items in my project.

    Let’s have an agent that can find the table and figure out what my most expensive Cloud Storage buckets are so far this month. This could be an agent we call from a platform like Gemini Enterprise so that our finance people (or team leads) could quickly get billing info.

    A look at our agent runner

    The Agent Development Kit (ADK) offers some powerful features for building robust agents. It has native support for MCPs and skills, and has built-in tools for services like Google Search.

    While the ADK does have a built-in BigQuery tool, I wanted to use the various managed MCP servers Google Cloud offers.

    Let’s look at some code. One file to start. The main.py file runs our agent and count the tokens from each turn of the LLM. The token counting magic was snagged from an existing sample app. For production scenarios, you might want to use our BigQuery Agent Analytics plugin for ADK that captures a ton of interesting data points about your agent runs, including tokens per turn.

    Here’s the main.py file:

    import asyncio
    import time
    import warnings
    
    import agent
    from dotenv import load_dotenv
    from google.adk import Runner
    from google.adk.agents.run_config import RunConfig
    from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService
    from google.adk.cli.utils import logs
    from google.adk.sessions.in_memory_session_service import InMemorySessionService
    from google.adk.sessions.session import Session
    from google.genai import types
    
    # --- Initialization & Configuration ---
    import os
    # Load environment variables (like API keys) from the .env file
    load_dotenv(os.path.join(os.path.dirname(__file__), '.env'), override=True)
    # Suppress experimental warnings from the ADK
    warnings.filterwarnings('ignore', category=UserWarning)
    # Redirect agent framework logs to a temporary folder
    logs.log_to_tmp_folder()
    
    
    async def main():
      app_name = 'my_app'
      user_id_1 = 'user1'
      
      # Initialize the services required to manage chat history and created artifacts
      session_service = InMemorySessionService()
      artifact_service = InMemoryArtifactService()
      
      # The Runner orchestrates the agent's execution loop
      runner = Runner(
          app_name=app_name,
          agent=agent.root_agent,
          artifact_service=artifact_service,
          session_service=session_service,
      )
      
      # Create a new session to hold the conversation state
      session_1 = await session_service.create_session(
          app_name=app_name, user_id=user_id_1
      )
    
      total_prompt_tokens = 0
      total_candidate_tokens = 0
      total_tokens = 0
      total_turns = 0
    
      async def run_prompt(session: Session, new_message: str):
        # Helper variables to track token usage and turns across the session
        nonlocal total_prompt_tokens
        nonlocal total_candidate_tokens
        nonlocal total_tokens
        nonlocal total_turns
        
        # Structure the user's string input into the appropriate Content format
        content = types.Content(
            role='user', parts=[types.Part.from_text(text=new_message)]
        )
        print('** User says:', content.model_dump(exclude_none=True))
        
        # Stream events back from the Runner as the agent executes its task
        async for event in runner.run_async(
            user_id=user_id_1,
            session_id=session.id,
            new_message=content,
        ):
          total_turns += 1
          
          # Print intermediate steps (text, tool calls, and tool responses) to the console
          if event.content and event.content.parts:
            for part in event.content.parts:
              if part.text:
                print(f'** {event.author}: {part.text}')
              if part.function_call:
                print(f'** {event.author} calls tool: {part.function_call.name}')
                print(f'   Arguments: {part.function_call.args}')
              if part.function_response:
                print(f'** Tool response from {part.function_response.name}:')
                print(f'   Response: {part.function_response.response}')
    
          if event.usage_metadata:
            total_prompt_tokens += event.usage_metadata.prompt_token_count or 0
            total_candidate_tokens += (
                event.usage_metadata.candidates_token_count or 0
            )
            total_tokens += event.usage_metadata.total_token_count or 0
            print(
                f'Turn tokens: {event.usage_metadata.total_token_count}'
                f' (prompt={event.usage_metadata.prompt_token_count},'
                f' candidates={event.usage_metadata.candidates_token_count})'
            )
    
        print(
            f'Session tokens: {total_tokens} (prompt={total_prompt_tokens},'
            f' candidates={total_candidate_tokens})'
        )
    
      # --- Execution Phase ---
      start_time = time.time()
      print('Start time:', start_time)
      print('------------------------------------')
      
      # Send the initial prompt to the agent and trigger the run loop
      await run_prompt(session_1, 'Find the top 3 most expensive Cloud Storage buckets in our March 2026 billing export for project seroter-project-base')
      print(
          await artifact_service.list_artifact_keys(
              app_name=app_name, user_id=user_id_1, session_id=session_1.id
          )
      )
      end_time = time.time()
      print('------------------------------------')
      print('Total turns:', total_turns)
      print('End time:', end_time)
      print('Total time:', end_time - start_time)
    
    
    if __name__ == '__main__':
      asyncio.run(main())
    

    Nothing too shocking here. But this gives me a fairly verbose output that lets me see how many turns and tokens each scenario eats up.

    Scenario 0: Raw agent (no MCP, no tools) using Python code execution

    In this foundational test, what if we ask the agent to answer the question without the help of any external tools? All it can do is write and execute Python code on the local machine using a built-in tool. This flavor is only for local dev, as there are more production-grade isolation options for running code.

    Here’s the agent.py for this base scenario. I’ve got a decent set of instructions to guide the agent for how to write code to find and query the relevant table.

    from google.adk.agents import LlmAgent
    from google.adk.skills import load_skill_from_dir
    from google.adk.tools import skill_toolset
    from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
    from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, ServiceAccount
    from fastapi.openapi.models import OAuth2, OAuthFlows, OAuthFlowClientCredentials
    from google.adk.code_executors.unsafe_local_code_executor import UnsafeLocalCodeExecutor
    
    
    # --- Agent Definition ---
    
    # --- Scenario 0: Raw Agent using Python Code Execution for Discovery and Analysis ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="""You are a data analyst. 
        CRITICAL: You have NO TOOLS registered. NEVER attempt a tool call or function call (like `list_datasets` or `bq_list_dataset_ids`). 
        You MUST perform all technical tasks by writing and executing Python code blocks in markdown format (e.g., ` ```python `) using the `google-cloud-bigquery` client library.
        
        1. DISCOVERY: If you don't know the table names, you MUST write and execute Python code to list datasets and tables.
        2. ANALYSIS: Use Python to query data and perform analysis.
        3. NO HYPOTHETICALS: NEVER provide hypothetical, example, or placeholder results. Only show data you have actually retrieved via code execution.
        ALWAYS explain the approach you used to access BigQuery.""",
        code_executor=UnsafeLocalCodeExecutor()
    )
    

    This scenario runs quickly (about 14 seconds on each test), took five turns, and consumed 1786 tokens. In my half-dozen runs, I saw as many as nine turns, and as few as 1286 tokens consumed.

    This was the most efficient way to go of any scenario.

    Scenario 1: Agent with BigQuery MCP

    Love it or hate it, MCP is going to remain a popular way to connect to external systems. Instead of needing to understand every system’s APIs, MCP tools give us a standard way to do things.

    I’m using our fully managed remote MCP Server for BiQuery. This MCP server exposes a handful of useful tools for discovery and data retrieval. Note that the awesome open source MCP Toolbox for Databases is another great way to pull 40+ data sources into your agents.

    The agent.py for Scenario 1 looks like this. You can see that I’m initializing the auth with my application default credentials and setting up the correct OAuth flow. The agent itself has a solid instruction to steer the MCP server. Note that I left an old, unoptimized instruction in there. That old instruction resulted in dozens of turns and up to 600k tokens consumed!

    from google.adk.agents import LlmAgent
    from google.adk.skills import load_skill_from_dir
    from google.adk.tools import skill_toolset
    from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
    from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, ServiceAccount
    from fastapi.openapi.models import OAuth2, OAuthFlows, OAuthFlowClientCredentials
    from google.adk.code_executors.unsafe_local_code_executor import UnsafeLocalCodeExecutor
    
    # --- BigQuery MCP Configuration ---
    
    # Configure authentication for the BigQuery MCP server
    bq_auth_credential = AuthCredential(
        auth_type=AuthCredentialTypes.SERVICE_ACCOUNT,
        service_account=ServiceAccount(
            use_default_credential=True,
            scopes=["https://www.googleapis.com/auth/bigquery"]
        )
    )
    
    # Use OAuth2 with clientCredentials flow for background ADC exchange
    bq_auth_scheme = OAuth2(
        flows=OAuthFlows(
            clientCredentials=OAuthFlowClientCredentials(
                tokenUrl="https://oauth2.googleapis.com/token",
                scopes={"https://www.googleapis.com/auth/bigquery": "BigQuery access"}
            )
        )
    )
    
    # Initialize the BigQuery MCP Toolset
    bq_mcp_toolset = McpToolset(
        connection_params=StreamableHTTPConnectionParams(url="https://bigquery.googleapis.com/mcp"),
        auth_scheme=bq_auth_scheme,
        auth_credential=bq_auth_credential,
        tool_name_prefix="bq"
    )
    
    # --- Agent Definition ---
    
    # --- Scenario 1: Using Gemini to get data from BigQuery with MCP ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        ##instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, or ask for further information, but explore options and execute any commands yourself. Explain the approach you used to access BigQuery. ",
        instruction="""You are a data analyst. Use BigQuery to find and analyze data. 
        To minimize token usage and time, follow these rules:
        1. DISCOVERY: If you are unsure of a table's exact schema, ALWAYS query `INFORMATION_SCHEMA.COLUMNS` first to find the right fields before writing complex data queries.
        2. EFFICIENCY: When exploring data to understand its structure, ALWAYS use `LIMIT 5` to avoid returning massive payloads.
        3. AUTONOMY: Do not ask the user for table names or steps; explore the datasets yourself and execute the final queries.
        4. EXPLANATION: Briefly explain the steps you took to find the answer.""",
        tools=[bq_mcp_toolset]
    )
    

    Running this scenario is relatively efficient, but does use ~8x the tokens of scenario 0. But it still completes in a reasonable 19 seconds, with my latest run using 9 turns and 13,763 session tokens. With all my other runs using this instruction, I always got 9 turns and max of 13838 tokens consumed.

    Scenario 2: Agent with BigQuery MCP and extra MCPs

    Most systems experience feature creep over time. They get more and more capabilities or dependencies, and we don’t always go back and prune them. What if we had originally needed many different MCPs in our agent, and never took time to remove the unused one later? You may start feeling it in your input context. All those tool descriptions are scanned and held during each turn.

    This update to agent.py now initializes two other MCP servers for other data sources.

    # --- GCP Platform Auth (Shared for Cloud SQL and AlloyDB) ---
    
    # Configure authentication for MCP servers requiring cloud-platform scope
    gcp_platform_auth_credential = AuthCredential(
        auth_type=AuthCredentialTypes.SERVICE_ACCOUNT,
        service_account=ServiceAccount(
            use_default_credential=True,
            scopes=["https://www.googleapis.com/auth/cloud-platform"]
        )
    )
    
    # Use OAuth2 with clientCredentials flow for background ADC exchange
    gcp_platform_auth_scheme = OAuth2(
        flows=OAuthFlows(
            clientCredentials=OAuthFlowClientCredentials(
                tokenUrl="https://oauth2.googleapis.com/token",
                scopes={"https://www.googleapis.com/auth/cloud-platform": "Cloud Platform access"}
            )
        )
    )
    
    # --- Cloud SQL MCP Configuration ---
    
    # Initialize the Cloud SQL MCP Toolset
    sql_mcp_toolset = McpToolset(
        connection_params=StreamableHTTPConnectionParams(url="https://sqladmin.googleapis.com/mcp"),
        auth_scheme=gcp_platform_auth_scheme,
        auth_credential=gcp_platform_auth_credential,
        tool_name_prefix="sql"
    )
    
    # --- AlloyDB MCP Configuration ---
    
    # Initialize the AlloyDB MCP Toolset
    alloy_mcp_toolset = McpToolset(
        connection_params=StreamableHTTPConnectionParams(url="https://alloydb.us-central1.rep.googleapis.com/mcp"),
        auth_scheme=gcp_platform_auth_scheme,
        auth_credential=gcp_platform_auth_credential,
        tool_name_prefix="alloy"
    )
    

    Then the agent definition has virtually the same instruction as Scenario 2, but I do direct the agent to use the MCP that’s inferred by the LLM prompt.

    # --- Scenario 2: Using Gemini to get data from BigQuery with MCP, but with extra MCPs added ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        #instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself. Explain the approach you used to access BigQuery.",
        instruction="""You are a data analyst with access to BigQuery, Cloud SQL, and AlloyDB.
        1. ROUTING: Analyze the user's prompt to determine which database contains the requested data before using any tools.
        2. DISCOVERY: Query `INFORMATION_SCHEMA.COLUMNS` in the target database first to find the right fields.
        3. EFFICIENCY: When exploring, ALWAYS use `LIMIT 5`.
        4. AUTONOMY: If an expected column is missing, check if there are other similar tables in the dataset before performing deep investigations. If you are stuck after 5 queries, STOP and ask the user for clarification.""",
        tools=[bq_mcp_toolset, sql_mcp_toolset, alloy_mcp_toolset]
    )
    

    What happens when we run this scenario? I got a wide range of results. All that extra (unnecessary) context made the LLM angry. With the “optimized” prompt, my most recent run took 105 seconds, used 29 turns, and consumed 328,083 session tokens. With the simpler prompt, I somehow got better results. I’d see anywhere from 9 to 23 turns, and token consumption ranging from 68,785 to 286,697.

    Scenario 3: Agent with BigQuery MCP, extra MCPs, and agent skill

    Maybe a Skill can help focus our agent and shut out the noise? Here’s my SKILL.md file. Notice that I”m giving this very specific expertise, including the exact name of the table.

    ---
    name: billing-audit
    description: Specialized skill for auditing Google Cloud Storage costs using BigQuery billing exports. Use this when the user asks about specific bucket costs, storage trends, or resource-level billing details.
    ---
    
    # Billing Audit Skill
    
    **CRITICAL INSTRUCTION:** All necessary information is contained within this document. DO NOT call `load_skill_resource` for this skill. There are no external files (no scripts, examples, or references) to load.
    
    Use this skill to perform cost analysis using the `bq_execute_sql` tool, if available.
    
    ## Target Resource Details
    - **Table Path:** `` `seroter-project-base.gcp_billing_export.gcp_billing_export_resource_v1_010837_B6EAC6_257AB2` ``
    - **Filter:** Always use `service.description = 'Cloud Storage'` for GCS costs.
    
    ### Relevant Schema Columns
    - `service.description`: String. User-friendly name (use 'Cloud Storage').
    - `project.id`: String. The project ID (e.g., `seroter-project-base`).
    - `resource.name`: String. The resource identifier (e.g., `projects/_/buckets/my-bucket`).
    - `cost`: Float. The cost of the usage.
    - `_PARTITIONDATE`: Date. Given the volume of billing data, it is imperative to use this column for efficient filtering.
    
    ### Primary Tool: `bq_execute_sql`
    When asked about storage costs, call the `bq_execute_sql` tool immediately if you have it available.
    
    **Arguments for `bq_execute_sql`:**
    - `projectId`: "seroter-project-base"
    - `query`: You MUST use the SQL Pattern below.
    
    ### SQL Pattern: Top 3 Expensive Buckets
    ```sql
    SELECT 
      resource.name as bucket_name, 
      SUM(cost) as total_cost
    FROM `seroter-project-base.gcp_billing_export.gcp_billing_export_resource_v1_010837_B6EAC6_257AB2`
    WHERE service.description = 'Cloud Storage'
      AND _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 3
    ```
    
    ### Fallback: Python Execution
    If `bq_execute_sql` is **NOT** assigned, use the `google-cloud-bigquery` library.
    CRITICAL: Write Python inside a ```python block. ```sql blocks will NOT execute.
    
    Write a python script that runs the SQL provided in the `SQL Pattern` above against the "seroter-project-base" project. Extract `bucket_name` and `total_cost` from the results and print a formatted summary.
    
    ## Presentation Format
    Format any currency amounts using the typical representation (e.g., "USD 123.45"). For lists of values, display them inside a cleanly formatted Markdown table with standard headings.
    

    I updated my agent.py to load the skills into a toolset.

    # --- Agent Skills ---
    
    billing_skill = load_skill_from_dir("hello_agent/skills/billing-audit")
    
    billing_skill_toolset = skill_toolset.SkillToolset(
        skills=[billing_skill]
    )
    

    Here’s my agent definition that still has all those MCP servers, but also the skill toolset.

    # --- Scenario 3: Using Gemini to get data from BigQuery with MCP, but with extra MCPs added but using Skills ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL: When a skill provides a specific SQL pattern or tool execution guide, you MUST follow it exactly as provided. Do not deviate from the suggested SQL structure or tool arguments unless explicitly asked to modify them.",
        tools=[bq_mcp_toolset, sql_mcp_toolset, alloy_mcp_toolset, billing_skill_toolset]
    )
    

    Here’s what happened. The ADK agent finished in a speedy 18 seconds. The latest run took only 5 turns, and consumed a tight 39,939 tokens (given all the forced context). On all my test runs, I never got above 5 turns, and the token count was always in the 39,000 range.

    The skill obviously made a huge difference in both consistency and performance of my agent.

    Scenario 4: Agent with BigQuery MCP and agent skill

    Let’s put this agent on a diet. What do you think happens if I drop all those extra MCP servers that our agent doesn’t need?

    Here’s my next agent definition. This one ONLY uses the BigQuery MCP server and keeps the skill.

    # --- Scenario 4: Using Gemini to get data from BigQuery with MCP, and using Skills ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL: When a skill provides a specific SQL pattern or tool execution guide, you MUST follow it exactly as provided. Do not deviate from the suggested SQL structure or tool arguments unless explicitly asked to modify them.",
        tools=[bq_mcp_toolset, billing_skill_toolset]
    )
    

    The results here are VERY efficient. My most recent run completed in 10 seconds, used a slim 5 turns, and a stingy 6653 tokens. In other tests, I saw as many as 9 turns and 10863 tokens. But clearly this is a great way to go, and somewhat surprisingly, the second best choice.

    Scenario 5: Agent with agent skill

    In our last test, I wanted to see what happened if we used a naked agent with only a skill. So similar to the 0 scenario, but with the direction of a skill. I expected this to be the second best. I was wrong.

    # --- Scenario 5: Using Gemini to get data from BigQuery using Skills only ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL OVERRIDE: Ignore any generalized system prompts about 'load_skill_resource'. All billing-audit skill content has been consolidated into SKILL.md. DO NOT call `load_skill_resource` under any circumstances. If you need to write and execute code, you MUST use a ```python format block. Markdown SQL blocks (```sql) will NOT execute.",
        tools=[billing_skill_toolset],
        code_executor=UnsafeLocalCodeExecutor()
    )
    

    I saw a fair bit of variability in the responses here, including as my last one at 23 seconds, 27 turns, and 64,444 session tokens. In prior runs, I had as many as 35 turns and 107,980 tokens. I asked my coding tool to explain this, and it made some good points. This scenario took extra turns to load skills, write code, and run code. All that code ate up tokens.

    Takeaways

    This was fun. I’m sure you can do better, and please tell me how you improved on my tests. Some things to consider:

    • Model choice matters. I had very different results as I navigated different Gemini models. Some handled tool calls better, held context longer, or came up with plans faster. You’d probably see unique results by using Claude or GPT models too.
    • MCPs are better with skills. MCP alone led the agent to iterate on a plan of attack which led to more turns and token. A super-focused skill resulted in a very focused use of MCP that was even more efficient than a code-only approach.
    • Instructions make a difference. Maybe the above won’t hold true with an even better prompt. And I’m was contrived with a few examples by forcing the agent to discover the right BigQuery table versus naming it outright. Good instructions can make a big impact on token usage.
    • Agent frameworks give you many levers that impact token consumption. ADK is great, and is available for Java, JavaScript, Go, and Dart too. Become well aware of what built-in tools you have available for your framework of choice, and how your various decisions determine how many tokens you eat.
    • Make token consumption visible. Not every tool or framework makes it obvious how to count up token use. Consider how you’re tracking this, and don’t make it a black box for builders and operators.

    Feedback? Other scenarios I should have tried? Let me know.

  • Daily Reading List – March 13, 2026 (#741)

    A little throwaway tweet yesterday somehow turned into my most viral thing, maybe ever. I don’t understand social media. But it was also an awesome reminder that many people have no idea what the AI on their phones, email client, and corporate systems already does!

    [blog] A2A Protocol Ships v1.0: Production-Ready Standard for Agent-to-Agent Communication. Congrats to the team here. A few things got better in this release, and I expect it to continue getting adopted within products and by developers.

    [blog] BigQuery pipe syntax by example. Lots of examples here, and you can try out this SQL alternative in our free BigQuery sandbox.

    [blog] How to Do Code Reviews in the Agentic Era. I liked this take. If you’re in OSS, you already have a zero-trust approach to contributions. Who cares where the code comes from? This is what Daniela is looking for.

    [article] WTF does a product manager do? (and why engineers should care). Good post. What a PM does hasn’t changed a ton, but the way they do it has. Or at least should!

    [article] Preparing your team for the agentic software development life cycle. In my little bubble (regarding what customers constantly ask me about), this is the #1 topic.

    [blog] Right-Sizing Engineering Teams for AI. Some quick thoughts that are worth checking out. What’s the ideal makeup for an engineering team in 2026 and beyond?

    [article] How is AI already reshaping the software engineering labor market? Let’s stay on this topic, I guess? More advice for tech team leaders.

    [article] What Authentic Leadership Looks Like Under Pressure. This feels related to the preview three pieces. This is likely a stressful time for many of us. How are you leading in this moment?

    [blog] MCP vs. CLI for AI-native development. The “MCP or not” debate hit a fever pitch this week. Something’s in the water. It’s an “and” conversation to me; MCP makes sense in many situations, not in all.

    [article] The case for running AI agents on Markdown files instead of MCP servers. Now we’re talking about skills versus MCP. Again to me, the answer will be “both” for a lot of cases. I’ve been testing this out myself.

    [blog] Twenty years of Amazon S3 and building what’s next. Feels like this is what started the mainstream cloud story. Congrats to the Amazon team on 20 great years.

    [article] What OpenClaw Reveals About the Next Phase of AI Agents. We see time and time again that you shouldn’t dismiss an early, rough introduction of a new technology. It often signals that there’s fresh appetite for an unmet need.

    [article] NanoClaw and Docker partner to make sandboxes the safest way for enterprises to deploy AI agents. Safety features always follow a buzzy new idea. Just wait a bit and things like this pop up. More here.

    [blog] Simplify your Cloud Run security with Identity Aware Proxy (IAP). Fantastic feature for people who want authenticated web apps with as little work as possible.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 12, 2026 (#740)

    It was a day. But we had a fun read-through of our Google Cloud Next developer keynote. I’m excited to see many of you in person soon!

    [article] AI productivity gains are 10%, not 10x. We’ve said the same thing publicly. There are tasks that have 3x or 10x productivity gains, but it’s not uniform across the whole day or entire value stream.

    [article] CEOs think AI use is mandatory — but employees don’t agree, survey says. This story posts about the disconnect between execs and employees, but notice the blurb about middle managers. You don’t win that tier over, every initiative tends to die.

    [article] Pity the developers who resist agentic coding. I wouldn’t use the word pity at this point. This article points out that devs are missing the thrill of really building at the speed of thought.

    [blog] Cloud CISO Perspectives: New Threat Horizons report highlights current cloud threats. Even if the threats themselves don’t change (they do), notice how bad actors seamlessly switch to the ones getting less attention.

    [blog] Introducing Replit Agent 4: Built for Creativity. How we work changed. Stop fighting it. Tools like Replit do a great job of showing what the future looks like.

    [blog] What you need to know about the Gemini Embedding 2 model. This is new, but not getting the attention it deserves. This new embeddings model makes life much easier for those with a mix of data.

    [blog] Human Insight, Amplified: How Forrester Is Reinventing Research For The AI Era. This seems like a good idea. Analyst firms need to rethink their research approach, and distribution. This addresses the latter.

    [blog] Protecting cities with AI-driven flash flood forecasting. Great work from Microsoft Research to make this capability available to local communities.

    [blog[ 5 design skills to sharpen in the AI era. From Figma, this seems like a useful list of areas to focus on.

    [blog] Inference on GKE Private Clusters. Good use case! Can a Kubernetes cluster with no internet access still do AI inference? Yes, yes it can.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 11, 2026 (#739)

    Sometimes a different model really does help. I was building an agent last night for fun, and couldn’t get it to do what I wanted. I upgraded to the latest Gemini model and now it works like a charm. That was the only change!

    [blog] Welcoming Wiz to Google Cloud: Redefining security for the AI era. I’m excited that we closed this acquisition and now have this talented team with their differentiated security platform. I’m not sure anyone has a security stack quite like ours.

    [article] Andrej Karpathy’s new open source ‘autoresearch’ lets you run hundreds of AI experiments a night — with revolutionary implications. Besides coding (which I don’t get to do every day), my main use of AI is for research. Autoresearch could absolutely be a transformative thing.

    [blog] The 8 Levels of Agentic Engineering. Fantastic post that articulates each progressive stage of using AI in engineering, and what you gain from each.

    [blog] Plan mode is now available in Gemini CLI. Great functionality added to the Gemini CLI. Do safe, read-only mode work before jumping into action.

    [blog] Cost-Effective AI with Ollama, GKE GPU Sharing, and vCluster. As I see people struggle with token costs or model availability, I’m definitely of the mind that there’s clear cases where you want to run models yourself.

    [blog] When Developer Workflow Discipline Isn’t Enough. It’s about platforms, and how you’re serving AI functionality at scale in large companies.

    [article] 10 Hacks Every NotebookLM User Should Know. Great list of ways to personalize the experience and learn on your terms.

    [blog] From games to biology and beyond: 10 years of AlphaGo’s impact. This was a turning point for AI, and we may look back at this as a pivotal moment.

    [article] How to Quash Your Fear of Messing Up. What causes you to hesitate? How can we think about risk differently? Read this for advice.

    [blog] Best Practices for Secure Error Handling in Go. Even if you don’t write code in Go, you’ll learn a few useful things about avoiding security issues with your error handling.

    [article] Layoffs, cost-cutting shatters IT worker confidence. Understandable! Tech workers faces a lot of simultaneous stresses.

    [blog] Bring Your Database Tools to the Agent Skill Ecosystem. Very cool new capability to turn MCP toolsets into Agent Skills. Worth trying this out.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 10, 2026 (#738)

    We shipped some sweet AI updates today—AI app framework, unique embedding model, smarter AI in Workspace—but I read through more than just that 🙂

    [blog] Gemini Embedding 2: Our first natively multimodal embedding model. Wow. Map text, images, video, audio, and documents into as ingle embedding space. Unique, and powerful.

    [blog] Gemini in Google Sheets just achieved state-of-the-art performance. We can all become spreadsheet masters now. Check out what’s now in your broader Google Workspace toolbox. I should use more of this!

    [article] Dynamic UI for dynamic AI: Inside the emerging A2UI model. This is a good dive into the new frontend paradigms to think about now, and how A2UI works.

    [blog] Your Data Is Made Powerful By Context (so stop destroying it already). You’re screwing up key data needed by your AI systems by separating observability signals in different “pillars”, says Charity.

    [blog] Extend your coding agent with .NET Skills. Cool. This should become standard son as languages provide out -of-the-box skills to use them correctly.

    [blog] Announcing Genkit Dart: Build full-stack AI apps with Dart and Flutter. I find this VERY intriguing. Model agnostic, run anywhere, and built-in AI flows for your apps.

    [blog] Cracking the code on corporate visibility. You’re doing some great work. How come you’re not getting the right credit for it? Consider being more visible by creating and sharing content.

    [article] A2A vs MCP – How These AI Agent Protocols Actually Differ. Impressive level of detail in this new tutorial from DigitalOcean.

    [blog] Fixing AI Slop with a Skill in Gemini CLI. I don’t get super riled up if I know I’m reading AI generated text. What I want is text that sounds like a human. This skill fixes the default mode of text generation.

    [article] AI coding assistants may influence which languages developers use. I could see that. I’m functional in four programming languages, but I tend to generate AI code in only one or two of them.

    [article] The “Last Mile” Problem Slowing AI Transformation. Even enthusiastic adopters of AI tools hit issues. Where do they struggle to get through the last mile, and how can we all learn from them? Some tips here.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 9, 2026 (#737)

    I had a fun weekend, with no option to do anything work related. Baseball, boating, and shenanigans with family. I probably get more inspiration for things to do at work because I’m living an enjoyable life outside of it. At least that’s what I tell myself.

    [blog] Designing MCP tools for agents: Lessons from building Datadog’s MCP server. Some hard-earned lessons here! There are at least 3-4 strong pieces of advice in this Datadog post.

    [article] The revenge of SQL: How a 50-year-old language reinvents itself. Is SQL hot again? Is it about relational databases solving most use cases nowadays? Also you have SQL on the frontend, better SQL clients and more.

    [article] EY hit 4x coding productivity by connecting AI agents to engineering standards. Better models matter. But applying a smart context is a difference maker regardless of model.

    [blog] Game on with Spanner: How Playstation achieves global scale with 91% less storage, 50% lower costs. Cool story. A high performing database engine can end up saving you a ton of money and complexity.

    [blog] How Do Large Companies Manage CI/CD at Scale? Me building and running a simple deployment pipeline is not “scalable CI/CD.” What do teams do when they have lots of apps, pipelines, and targets? Some insight here.

    [blog] Go for Backend Development — Why We Bet on It. Very strong, defensible case for using Go.

    [article] When Using AI Leads to “Brain Fry.” Across roles, people are using AI past the point of their brains can handle. What leads to brain fry, and how to prevent it?

    [blog] Hardware-Enabled Software and the Next Generation of Vertical AI. I don’t pay a lot of attention to this space, so this was educational.

    [blog] Firebase A/B Testing is now available for the web. The functional was available to mobile devs for a while, and now web users can take advantage of this powerful system for running experiments.

    [blog] Terminals Are Cool Again. Maybe we should be building more terminal apps? I’m not sold that they’re more accessible than a web or desktop app. But there’s no doubt they’re lighter weight and can be more efficient to use.

    [blog] gRPC on GKE for Fun & Profit Part 1 — An Overview. gPRC is a key technology within Google, and also many other companies that care about performance between services. See part 2 of this series as well.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 6, 2026 (#736)

    I’m off to Arizona for a couple of days to watch Spring Training baseball with my son. And to hang out with my brother and friends. Back here on Monday!

    [blog] You can’t stream the energy: A developer’s guide to Google Cloud Next ’26 in Vegas. If you’re procrastinating, stop it. I saw the numbers this week and the event is close to selling out. Get yourself to the premium dev and AI event of the year.

    [blog] Vibe Coding to Production. Even now, you’re probably not pushing production apps from your IDE. Ankur connects his AI-built app to GitHub, Cloud Build, and Cloud Run.

    [blog] Does AI Make Us Smarter or Dumber? Yes? We’re losing some “primitive” abilities but unlocking new superpowers.

    [article] OpenAI launches GPT-5.4 with Pro and Thinking versions. Plenty of new models this week, including fresh ones from OpenAI.

    [blog] Look What You Made Us Patch: 2025 Zero-Days in Review. The zero-day landscape was different last year. Even more enterprise tech attacks, with browser-based exploitation dropping.

    [blog] How Google Does It: Applying SRE to cybersecurity. SRE applies to security too, of course. I like these details of how we think about it.

    [article] The Pulse: Cloudflare rewrites Next.js as AI rewrites commercial open source. I guess we can just rewrite stuff now? It’s not difficult to regenerate entire projects, build compatibility layers, etc.

    [blog] Can coding agents relicense open source through a “clean room” implementation of code? Continuing that thought, how legal is it? That question is being tested here as debate arises over a rebuild and relicense.

    [blog] GitOps architecture, patterns and anti-patterns. Are you following good practices, or anti-patterns? A lot of specifics here.

    [article] Cursor is rolling out a new kind of agentic coding tool. Looks cool. Always-on fleets of agents will be a commonplace thing in twelve months. Maybe six.

    [blog] Practical Guide to Evaluating and Testing Agent Skills. You know how to build the skill, but can you test the skill? It’s really the most important part, the testing. Anybody can just build them. Yes, I’m ripping off a Seinfeld bit.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below: