Author: Richard Seroter

  • Daily Reading List – March 24, 2026 (#748)

    Travel day to New York City, and thankfully I had decent airplane wifi. Tomorrow I get to talk to a couple hundred non-tech folks about AI. There’s a lot for me to learn!

    [blog] Google ranks #1 on Fast Company’s Most Innovative Companies list. Proud of my colleagues for rising to the moment and doing creative work everywhere in the company.

    [article] Your database is about to become an AI tool. Is it ready? Any AI app or system that lasts will need database access. Have you done the necessary steps to prepare?

    [blog] Your Startup Is Probably Dead On Arrival. Well that’s a very uplifting title. But Steve Blank’s point is that if your startup is a couple of years old, you need to stop and look around to see what’s fundamentally changed.

    [blog] RSAC ’26: Supercharging agentic AI defense with frontline threat intelligence. Security is the topic of the week thanks to the RSA conference going on. Here’s our roundup of security-related data and news.

    [blog] Bringing dark web intelligence into the AI era. It’s dangerous out there, but the tools are getting better and giving you a leg up on adversaries.

    [blog] M-Trends 2026: Data, Insights, and Strategies From the Frontlines. This is quality data about the current threat landscape, and how it’s evolved over the past couple years.

    [blog] In Defense of Deep Reading. Read as much as you can. I agree with everything in here.

    [article] AWS at 20*: Inside the rise of Amazon’s cloud empire, and what’s at stake in the AI era. Long, insightful story about the orgins of AWS with quotes from insiders.

    [article] Why aren’t AI productivity gains higher? No one can sell you productivity. They can sell you tools to make yourself more productive. But only if the circumstances are set up for success.

    [blog] Skills vs. Tools: Replacing the Google Firestore MCP Server with Skills (+ Go Binaries). Neat pattern. Build super fast, local binaries that act like tools from within an agent skill.

    [article] When Senior Leaders Lack People Skills, Transformations Fail. No matter how good (or bad) you are at the people parts of the job, we can all get better. And should.

    [blog] Your AI agent can now create, edit, and manage content on WordPress.com. Hmm. I’ll probably pass, although some of the operational tasks might come in handy.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 23, 2026 (#747)

    I flew into San Jose last night ahead of today’s in-person rehearsal for the Google Cloud Next developer keynote. Super fun day with people I have a lot of affection for. Tomorrow, off to New York City for 24 hours.

    [blog] Developer AI Tooling in 2026: Trends Shaping How We Build. This seems like a good assessment of where dev tools are right now.

    [blog] End-to-End AI Agent on GCP: ADK, BigQuery MCP, Agent Engine, and Cloud Run. Whether you’re using cloud services or not, most agent architectures include a mix of components that you stitch together.

    [youtube-video] Google just changed the future of UI/UX design… Great Fireship video that explores Google Stitch, a lifesaver for those of us who don’t create great frontend designs.

    [blog] Is the IDE dead? The code editor is becoming a “read only” view for more and more people. Addy looks at the role of the IDE moving forward.

    [paper] Cloud Infrastructure in the Agent-Native Era. We snuck this cool little paper out (direct link to PDF). Cloud native infrastructure was a step forward, but AI apps and agents need something further.

    [blog] The Three Pillars of JavaScript Bloat. How do dependency trees result in a bloated app? I can be hard to unravel, but there’s some advice here.

    [blog] Architecture Is On The Hook For GenAI Success. That sounds like something an architect would say! There’s definitely truth to the reality that scaling AI in any company will depend on platforms, guardrails, and good decisions.

    [blog] AI Doesn’t Fail in the Demo – It Fails the First Time You Have to Trust It. I can build a mean demo that gets you excited. But what matters is what it takes to trust AI tech at scale in your company.

    [[blog] Same Old. Very few things are “unprecedented.” It’s one reason I try to read a lot of history to gain perspective.

    [article] Cursor admits its new coding model was built on top of Moonshot AI’s Kimi. If you’re not shipping your own model, just say so. It’s ok. Cursor did right by quickly acknowledging this and giving credit.

    [blog] How Slack Rebuilt Notifications. Different mental models, overlapping settings, and more makes chat systems like Slack overwhelming. This looks like a smart redesign.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Full-stack vibe coding made easy

    Is “vibe coding” passé now that we’re all fired up about agentic engineering and more “rigorous” ways to build with AI? Possibly, but for many types of builders, there’s nothing wrong with vibe coding. Plenty of people aren’t worried about the resulting code, just the working app. There’s a time and place for that!

    One place is Google AI Studio. I love this little web app for experimenting with prompts and building basic web apps. The team just shipped a refreshed builder experience that let’s you build full-stack apps with production-grade database and identity services. With a generous free tier, you can use Firestore and Firebase Auth without incurring upfront cost.

    Let’s try it out from scratch. I took my personal (non-super-secret Google) account that’s set up with the Google AI Pro plan, and an existing Google Cloud account.

    After navigating to Google AI Studio, I chose the Build tab.

    There are all these pills below the center chatbox where I can pick tools for using Google Search or Maps data, generating videos from text, or (now) adding database and auth to our app.

    I wrote a prompt to generate an app for tracking my hotel stays. Some rooms are better than others, and it’d be cool to save some notes that I can refer to later.

    After clicking “Build”, AI Studio gets to work. Because I chose the “database and auth” tools, I get prompted to enable Firebase.

    It’s one click! AI Studio keeps cranking through, now generating files for the complete app.

    It takes a few minutes to build out the whole app, and then I see the resulting app preview. The chat box tells me a summary of what it created.

    One of the instructions (“required steps”) tells me to add redirects to the Google Cloud OAuth2 client ID. When I clicked the link, those redirects were already pre-loaded. No action needed.

    Checking the Google Cloud (or Firebase) console also reveals that a new Firestore database exists.

    Back in Ai Studio, I click the sign-in link and immediately get a redirect to log-in with Google. Thanks Firebase Authentication!

    Once I’m logged in, I can add a new hotel stay entry.

    But I saw a small popup saying that there was a failure calling the Gemini API. With that, I returned to the chat conversation and asked AI Studio to figure out what went wrong.

    After fixing the Gemini error, I tried the app again. This time it worked, and I saw my saved record and a pinpoint on the map.

    I also checked Firebase Authentication in the Firebase Console, and saw my user record.

    Cool! But I couldn’t find any data records in Firestore. Was it really saving the data? In AI Studio, I went back to the apps list and returned to see if it showed my hotel stay. It did, but this felt like a local cache. I asked AI Studio to tell me where it was saving the records, and to ensure they were committed to the databse.

    Perfect. This seems to fix the problem. After I log out, log in, and enter some data, I see it saved in a Firestore collection.

    Amazing. While I can edit my code in Google AI Studio, I don’t want to. That’s not what this surface is for. Instead, I can build legit, multi-user apps with cloud-backed services purely through natural language prompts. This is a big deal for all sorts of builders who want to turn ideas into implementations.

  • Daily Reading List – March 20, 2026 (#746)

    Project Hail Mary is maybe my favorite fiction book from the past few years. I read it twice, which I never do. So, I’m super excited to go see the movie tonight. If you see it too, let me know.

    [article] What Is the PARK Stack? I know there have been other acronym-stacks since LAMP, but I struggle to remember them. This one is about PyTorch, AI models, Ray, and Kubernetes. Might stick.

    [blog] Kubernetes v1.36 — Sneak Peek. Speaking of Kubernetes, there always seems to be another version around the corner. Scale to zero is interesting!

    [article] Anthropic just shipped an OpenClaw killer called Claude Code Channels, letting you message it over Telegram and Discord. Neat. Expect a lot of “OpenClaw killers” this year as people experiment with multi-agent orchestrators.

    [blog] How I overhauled my app UI in minutes with Stitch and AI Studio. Great example. Take those existing apps and let these smart tools help with redesign, rearchitecture, and deployment.

    [article] 9 reasons Java is still great. Java is doing fine. I’m not sure it’s the default choice for many startups, but it’s well-established and constantly improving.

    [blog] Next-gen caching with Memorystore for Valkey 9.0, now GA. If you like open software, fast performance, and reliable databases, Valkey could be on your radar.

    [blog] Building an MCP Ecosystem at Pinterest. Here we go. Let’s get some real-world practices from users, not just messages from vendors and thought-leaders.

    [blog] Streamline read scalability with Cloud SQL autoscaling read pools. One smart way to scale relational databases is to use read replicas. Now we’re offering a clean way to autoscale your read replicas without requiring any changes to your apps.

    [article] State of JavaScript 2025: Survey Reveals a Maturing Ecosystem with TypeScript Cementing Dominance. This is a dense report, so InfoQ rolled up some of the highlights. But dig into the source material and see what stands out to you.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 19, 2026 (#745)

    It’s fascinating to watch the evolution of thinking on a new topic. Agent skills are relatively new, and people are figuring out good practices. Should they be high-level or low-level? All inclusive within the Markdown or always distributed into scripts and resources? Anthropic had a good X article this week, OpenAI published some insights, and Google shipped a useful X article. Build things and learn for yourself what makes sense to you.

    [blog] Production Is Where the Rigor Goes. Don’t just check out production logs when there’s an error. Charity argues that production is the source of truth for your system and deserves regular observation.

    [blog] Introducing the new full-stack vibe coding experience in Google AI Studio. This is exciting. This is a killer tool for anyone who wants to take an idea and build a legit web app to implement it.

    [article] We mistook event handling for architecture. Did we all get too focused on event paths and handling in our architectures? This article shows how frameworks and thinking has evolved.

    [blog] Do Large Language Models follow Benford’s Law? Now I know what Benford’s law is. Apparently most distributions skew heavily toward numbers starting with 1 or 2. Do LLMs respect that?

    [article] What the Best AI Users Do Differently—and How to Level Up All of Your Employees. Really good. What do sophisticated AI users do? I like the list, and was surprised that manager+ people are doing the best here.

    [blog] Developer’s Guide to AI Agent Protocols. This throws a couple extra that you might not always hear about. With examples!

    [blog] Using skills to accelerate OSS maintenance. Some maintainers are giving up because of the flood of AI-assisted PRs. I can understand that. But others are smartly updating the tools and workflows at their disposal.

    [article] The unwritten laws of software engineering. What’s some tribal knowledge that never gets a fancy label, but we all kinda know it? Good list here.

    [blog] AI shopping gets simpler with Universal Commerce Protocol updates. There’s a lot happening in this space and we’re continuing to improve this agentic protocol.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 18, 2026 (#744)

    Today’s list definitely has some assertive opinions. What’s the new baseline for performance? How are you thinking about MCP wrong? What’s going to happen when you’ve lost comprehension of your codebase? We should ask hard questions and noodle on the answers.

    [blog] 10x is the new floor. As our tools get better, the floor goes up. More is expected. Being “just ok” at your job is a fairly risky proposition in 2026.

    [blog] Introducing “vibe design” with Stitch. This is such a game-changer for UX folks, but also everyone else who wants to bring smart design into their apps.

    [article] How coding agents work. Another good one from Simon that explains the agentic loops and techniques you find in coding agents.

    [blog] Gemini API tooling updates: context circulation, tool combos and Maps grounding for Gemini 3. Good quality of life update for people buildings AI apps and agents.

    [blog] Our latest investment in open source security for the AI era. A handful of us are pitching in to ensure that open source stays stable and secure.

    [blog] MCP Isn’t Dead You Just Aren’t the Target Audience. Allen makes the important point that not every agent has a shell or is a coding assistant. For many agents, MCP is an important connector.

    [article] Agents write code. They don’t do software engineering. I mostly agree with this. Today. But the line keeps moving, and if you think only humans will do engineering, I think you’ll be left behind.

    [article] How Uber Engineers Use AI Agents. These engineers use AI for assigned work. Here are insights from a recent talk by one of their leaders.

    [article] OpenClaw can bypass your EDR, DLP and IAM without triggering a single alert. Yes, agents aren’t ready for unfettered access to everything to do anything. But that may not last long. NVIDIA is doing work around this.

    [blog] From Ideation to Automation: The Scoop on Outages. McDonalds gets grief for offline ice cream machines, but apparently there’s more going on than I thought. And, better solutions to get back online.

    [blog] TikTok reduces code size by 58% and improves app performance for new features with Jetpack Compose. The right framework can make a meaningful difference in performance and maintenance cost.

    [blog] Comprehension Debt – the hidden cost of AI generated code. It’s your job to understand your code and how your system works. Are you piling up comprehension debt, or building in the right discipline?

    [article] Markdown is now a first-class coding language: Deal with it. There are so many ways nowadays to start nerd fights. Saying this is one of them.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 17, 2026 (#743)

    I need the right mix of meetings in my workday for it to be a good day. Some quality 1:1 chats, a few decision-focused meetings are fine, and then seeing something that’s exciting for our users. Today was a good day.

    [blog] Bringing the power of Personal Intelligence to more people. You’ll now see this in Search, the Gemini app, and Gemini in Chrome.

    [blog] Subagents. Simon’s been adding to this series of posts about agentic engineering patterns, and this one on subagents is helpful.

    [blog] Giving you more transparency and control over your Gemini API costs. This is a harder problem than you might think. Glad to see this team giving developers spending caps and other tools to control cost.

    [article] Google Workspace’s New AI Features Seem Genuinely Useful. Nice to hear. We’re all shown a lot of AI tools and probably only use a few.

    [report] The State of AI in the Enterprise. I’m surprised that this Deloitte report is ungated. Check it out for some useful information about enterprise approaches to AI.

    [blog] Measuring progress toward AGI: A cognitive framework. This links to a paper where we look at 10 cognitive abilities and how you’d evaluate progress towards Artificial General Intelligence.

    [article] Banks struggle to scale AI as legacy tech devours IT budgets. Until you get some of the prereqs under control, it’s going to be hard to throw important dollars at AI work. But results need to be there too!

    [blog] Introducing multi-cluster GKE Inference Gateway: Scale AI workloads around the world. Run inference workloads across clusters, and even across regions.

    [blog] State of Open Source on Hugging Face: Spring 2026. A metric ton of data here from Hugging Face. Which open models are used where, who is contributing the most, and much more.

    [blog] Developer Guide: Nano Banana 2 with the Gemini Interactions API. It’s an underrated API and Philipp is inspiring me to make this a bigger part of my toolbox.

    [blog] Agent Protocols — MCP, A2A, A2UI, AG-UI. Get familiar with these, or at least the use cases they purport to help.

    [blog] Announcing the Colab MCP Server: Connect Any AI Agent to Google Colab. Wicked. Offload to the Colab host and use notebooks as tools thanks to this new open MCP server.

    [docs] Durable AI agent with Gemini and Temporal. Want to persist the steps of an agentic loop so that you can resume in any situation? That’s what Temporal does.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – March 16, 2026 (#742)

    I waded a bit into the “MCP or not” debate by running some experiments to see how much MCP costs my custom-built agent. If you complement with agent skills, the answer is “not too much.”

    [blog] Become Builders, Not Coders. This is more of a directive versus suggestion at this point. What has to change and how do you do it? Here’s a post with advice.

    [blog] Balancing AI tensions: Moving from AI adoption to effective SDLC use. The DORA team used some fresh research to understand how teams are using AI, where they get value, and stumble. The suggestions are very good.

    [blog] Why context is the missing link in AI data security. These Google Cloud tools are really impressive at identifying and masking sensitive info. Now, with better context classifiers.

    [blog] Run Karpathy’s autoresearch on a Google serverless stack for $2/hour. With the exception of doing massive training jobs, most of us can try out nearly anything with AI for a reasonable cost. I like Karl’s example here.

    [article] Why the World Still Runs on SAP. Big ERP, CRM, and service management platforms aren’t going anywhere. But it’s going to get easier to set them up, use them, and operate them.

    [article] You’re Not Paid to Write Code. I recognize that I’ve shared a lot of posts on this topic. But it’s important. We’re not just adding tools to the mix; we’re changing identities and habits. That takes repetitive reminders and motivation.

    [blog] When to use WebMCP and MCP. Pay attention to WebMCP. It might turn out to be something fairly important.

    [blog] BigQuery Studio is more useful than ever, with enhanced Gemini assistant. I like this surface, and it’s made data analytics so much simpler for experts and novices.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • My custom agent used 87% fewer tokens when I gave it Skills for its MCP tools

    Today’s web apps don’t seem particularly concerned about resource consumption. The simplest site seems to eat up hundreds of MB of memory in my browser. We’ve probably gotten a bit lazy with optimization since many computers have horsepower to spare. But when it comes to LLM tokens, we’re still judicious. Most of us have bumped into quotas or unexpected costs!

    I see many examples of introducing and tuning MCPs and skills for IDEs and agentic tools. But what about the agents you’re building? What’s the token impact of using MCPs and skills for custom agents?

    I tried out six solutions with the Agent Development Kit (Python) and counted my token consumption for each. The tl;dr? A well-prompted Gemini with zero tools or skills is successful with the fewest tokens consumed, with the second best option being MCP + skills. Third-best in token consumption is raw Gemini plus skills.

    I trust that you can find a thousand ways to do this better than me, but here’s a table with the best results from multiple runs of each of my experiments. The title of the post refers to the difference between scenarios 2 and 3.

    ScenarioAgent DescriptionTurnsTokens
    0Instructions only, built in code execution tool71,286
    1Uses BigQuery MCP913,763
    2Uses BigQuery, AlloyDB, Cloud SQL MCPs29328,083
    3Uses BigQuery, AlloyDB, Cloud SQL MCPs with skill539,622
    4Use BigQuery MCP and a skill56,653
    5Instruction, skill, and built-in code execution tool2764,444

    What’s the problem to solve?

    I want an agent that can do some basic cloud FinOps for me. I’ve got a Google Cloud BigQuery table that is automatically populated with billing data for items in my project.

    Let’s have an agent that can find the table and figure out what my most expensive Cloud Storage buckets are so far this month. This could be an agent we call from a platform like Gemini Enterprise so that our finance people (or team leads) could quickly get billing info.

    A look at our agent runner

    The Agent Development Kit (ADK) offers some powerful features for building robust agents. It has native support for MCPs and skills, and has built-in tools for services like Google Search.

    While the ADK does have a built-in BigQuery tool, I wanted to use the various managed MCP servers Google Cloud offers.

    Let’s look at some code. One file to start. The main.py file runs our agent and count the tokens from each turn of the LLM. The token counting magic was snagged from an existing sample app. For production scenarios, you might want to use our BigQuery Agent Analytics plugin for ADK that captures a ton of interesting data points about your agent runs, including tokens per turn.

    Here’s the main.py file:

    import asyncio
    import time
    import warnings
    
    import agent
    from dotenv import load_dotenv
    from google.adk import Runner
    from google.adk.agents.run_config import RunConfig
    from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService
    from google.adk.cli.utils import logs
    from google.adk.sessions.in_memory_session_service import InMemorySessionService
    from google.adk.sessions.session import Session
    from google.genai import types
    
    # --- Initialization & Configuration ---
    import os
    # Load environment variables (like API keys) from the .env file
    load_dotenv(os.path.join(os.path.dirname(__file__), '.env'), override=True)
    # Suppress experimental warnings from the ADK
    warnings.filterwarnings('ignore', category=UserWarning)
    # Redirect agent framework logs to a temporary folder
    logs.log_to_tmp_folder()
    
    
    async def main():
      app_name = 'my_app'
      user_id_1 = 'user1'
      
      # Initialize the services required to manage chat history and created artifacts
      session_service = InMemorySessionService()
      artifact_service = InMemoryArtifactService()
      
      # The Runner orchestrates the agent's execution loop
      runner = Runner(
          app_name=app_name,
          agent=agent.root_agent,
          artifact_service=artifact_service,
          session_service=session_service,
      )
      
      # Create a new session to hold the conversation state
      session_1 = await session_service.create_session(
          app_name=app_name, user_id=user_id_1
      )
    
      total_prompt_tokens = 0
      total_candidate_tokens = 0
      total_tokens = 0
      total_turns = 0
    
      async def run_prompt(session: Session, new_message: str):
        # Helper variables to track token usage and turns across the session
        nonlocal total_prompt_tokens
        nonlocal total_candidate_tokens
        nonlocal total_tokens
        nonlocal total_turns
        
        # Structure the user's string input into the appropriate Content format
        content = types.Content(
            role='user', parts=[types.Part.from_text(text=new_message)]
        )
        print('** User says:', content.model_dump(exclude_none=True))
        
        # Stream events back from the Runner as the agent executes its task
        async for event in runner.run_async(
            user_id=user_id_1,
            session_id=session.id,
            new_message=content,
        ):
          total_turns += 1
          
          # Print intermediate steps (text, tool calls, and tool responses) to the console
          if event.content and event.content.parts:
            for part in event.content.parts:
              if part.text:
                print(f'** {event.author}: {part.text}')
              if part.function_call:
                print(f'** {event.author} calls tool: {part.function_call.name}')
                print(f'   Arguments: {part.function_call.args}')
              if part.function_response:
                print(f'** Tool response from {part.function_response.name}:')
                print(f'   Response: {part.function_response.response}')
    
          if event.usage_metadata:
            total_prompt_tokens += event.usage_metadata.prompt_token_count or 0
            total_candidate_tokens += (
                event.usage_metadata.candidates_token_count or 0
            )
            total_tokens += event.usage_metadata.total_token_count or 0
            print(
                f'Turn tokens: {event.usage_metadata.total_token_count}'
                f' (prompt={event.usage_metadata.prompt_token_count},'
                f' candidates={event.usage_metadata.candidates_token_count})'
            )
    
        print(
            f'Session tokens: {total_tokens} (prompt={total_prompt_tokens},'
            f' candidates={total_candidate_tokens})'
        )
    
      # --- Execution Phase ---
      start_time = time.time()
      print('Start time:', start_time)
      print('------------------------------------')
      
      # Send the initial prompt to the agent and trigger the run loop
      await run_prompt(session_1, 'Find the top 3 most expensive Cloud Storage buckets in our March 2026 billing export for project seroter-project-base')
      print(
          await artifact_service.list_artifact_keys(
              app_name=app_name, user_id=user_id_1, session_id=session_1.id
          )
      )
      end_time = time.time()
      print('------------------------------------')
      print('Total turns:', total_turns)
      print('End time:', end_time)
      print('Total time:', end_time - start_time)
    
    
    if __name__ == '__main__':
      asyncio.run(main())
    

    Nothing too shocking here. But this gives me a fairly verbose output that lets me see how many turns and tokens each scenario eats up.

    Scenario 0: Raw agent (no MCP, no tools) using Python code execution

    In this foundational test, what if we ask the agent to answer the question without the help of any external tools? All it can do is write and execute Python code on the local machine using a built-in tool. This flavor is only for local dev, as there are more production-grade isolation options for running code.

    Here’s the agent.py for this base scenario. I’ve got a decent set of instructions to guide the agent for how to write code to find and query the relevant table.

    from google.adk.agents import LlmAgent
    from google.adk.skills import load_skill_from_dir
    from google.adk.tools import skill_toolset
    from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
    from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, ServiceAccount
    from fastapi.openapi.models import OAuth2, OAuthFlows, OAuthFlowClientCredentials
    from google.adk.code_executors.unsafe_local_code_executor import UnsafeLocalCodeExecutor
    
    
    # --- Agent Definition ---
    
    # --- Scenario 0: Raw Agent using Python Code Execution for Discovery and Analysis ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="""You are a data analyst. 
        CRITICAL: You have NO TOOLS registered. NEVER attempt a tool call or function call (like `list_datasets` or `bq_list_dataset_ids`). 
        You MUST perform all technical tasks by writing and executing Python code blocks in markdown format (e.g., ` ```python `) using the `google-cloud-bigquery` client library.
        
        1. DISCOVERY: If you don't know the table names, you MUST write and execute Python code to list datasets and tables.
        2. ANALYSIS: Use Python to query data and perform analysis.
        3. NO HYPOTHETICALS: NEVER provide hypothetical, example, or placeholder results. Only show data you have actually retrieved via code execution.
        ALWAYS explain the approach you used to access BigQuery.""",
        code_executor=UnsafeLocalCodeExecutor()
    )
    

    This scenario runs quickly (about 14 seconds on each test), took five turns, and consumed 1786 tokens. In my half-dozen runs, I saw as many as nine turns, and as few as 1286 tokens consumed.

    This was the most efficient way to go of any scenario.

    Scenario 1: Agent with BigQuery MCP

    Love it or hate it, MCP is going to remain a popular way to connect to external systems. Instead of needing to understand every system’s APIs, MCP tools give us a standard way to do things.

    I’m using our fully managed remote MCP Server for BiQuery. This MCP server exposes a handful of useful tools for discovery and data retrieval. Note that the awesome open source MCP Toolbox for Databases is another great way to pull 40+ data sources into your agents.

    The agent.py for Scenario 1 looks like this. You can see that I’m initializing the auth with my application default credentials and setting up the correct OAuth flow. The agent itself has a solid instruction to steer the MCP server. Note that I left an old, unoptimized instruction in there. That old instruction resulted in dozens of turns and up to 600k tokens consumed!

    from google.adk.agents import LlmAgent
    from google.adk.skills import load_skill_from_dir
    from google.adk.tools import skill_toolset
    from google.adk.tools.mcp_tool import McpToolset, StreamableHTTPConnectionParams
    from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, ServiceAccount
    from fastapi.openapi.models import OAuth2, OAuthFlows, OAuthFlowClientCredentials
    from google.adk.code_executors.unsafe_local_code_executor import UnsafeLocalCodeExecutor
    
    # --- BigQuery MCP Configuration ---
    
    # Configure authentication for the BigQuery MCP server
    bq_auth_credential = AuthCredential(
        auth_type=AuthCredentialTypes.SERVICE_ACCOUNT,
        service_account=ServiceAccount(
            use_default_credential=True,
            scopes=["https://www.googleapis.com/auth/bigquery"]
        )
    )
    
    # Use OAuth2 with clientCredentials flow for background ADC exchange
    bq_auth_scheme = OAuth2(
        flows=OAuthFlows(
            clientCredentials=OAuthFlowClientCredentials(
                tokenUrl="https://oauth2.googleapis.com/token",
                scopes={"https://www.googleapis.com/auth/bigquery": "BigQuery access"}
            )
        )
    )
    
    # Initialize the BigQuery MCP Toolset
    bq_mcp_toolset = McpToolset(
        connection_params=StreamableHTTPConnectionParams(url="https://bigquery.googleapis.com/mcp"),
        auth_scheme=bq_auth_scheme,
        auth_credential=bq_auth_credential,
        tool_name_prefix="bq"
    )
    
    # --- Agent Definition ---
    
    # --- Scenario 1: Using Gemini to get data from BigQuery with MCP ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        ##instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, or ask for further information, but explore options and execute any commands yourself. Explain the approach you used to access BigQuery. ",
        instruction="""You are a data analyst. Use BigQuery to find and analyze data. 
        To minimize token usage and time, follow these rules:
        1. DISCOVERY: If you are unsure of a table's exact schema, ALWAYS query `INFORMATION_SCHEMA.COLUMNS` first to find the right fields before writing complex data queries.
        2. EFFICIENCY: When exploring data to understand its structure, ALWAYS use `LIMIT 5` to avoid returning massive payloads.
        3. AUTONOMY: Do not ask the user for table names or steps; explore the datasets yourself and execute the final queries.
        4. EXPLANATION: Briefly explain the steps you took to find the answer.""",
        tools=[bq_mcp_toolset]
    )
    

    Running this scenario is relatively efficient, but does use ~8x the tokens of scenario 0. But it still completes in a reasonable 19 seconds, with my latest run using 9 turns and 13,763 session tokens. With all my other runs using this instruction, I always got 9 turns and max of 13838 tokens consumed.

    Scenario 2: Agent with BigQuery MCP and extra MCPs

    Most systems experience feature creep over time. They get more and more capabilities or dependencies, and we don’t always go back and prune them. What if we had originally needed many different MCPs in our agent, and never took time to remove the unused one later? You may start feeling it in your input context. All those tool descriptions are scanned and held during each turn.

    This update to agent.py now initializes two other MCP servers for other data sources.

    # --- GCP Platform Auth (Shared for Cloud SQL and AlloyDB) ---
    
    # Configure authentication for MCP servers requiring cloud-platform scope
    gcp_platform_auth_credential = AuthCredential(
        auth_type=AuthCredentialTypes.SERVICE_ACCOUNT,
        service_account=ServiceAccount(
            use_default_credential=True,
            scopes=["https://www.googleapis.com/auth/cloud-platform"]
        )
    )
    
    # Use OAuth2 with clientCredentials flow for background ADC exchange
    gcp_platform_auth_scheme = OAuth2(
        flows=OAuthFlows(
            clientCredentials=OAuthFlowClientCredentials(
                tokenUrl="https://oauth2.googleapis.com/token",
                scopes={"https://www.googleapis.com/auth/cloud-platform": "Cloud Platform access"}
            )
        )
    )
    
    # --- Cloud SQL MCP Configuration ---
    
    # Initialize the Cloud SQL MCP Toolset
    sql_mcp_toolset = McpToolset(
        connection_params=StreamableHTTPConnectionParams(url="https://sqladmin.googleapis.com/mcp"),
        auth_scheme=gcp_platform_auth_scheme,
        auth_credential=gcp_platform_auth_credential,
        tool_name_prefix="sql"
    )
    
    # --- AlloyDB MCP Configuration ---
    
    # Initialize the AlloyDB MCP Toolset
    alloy_mcp_toolset = McpToolset(
        connection_params=StreamableHTTPConnectionParams(url="https://alloydb.us-central1.rep.googleapis.com/mcp"),
        auth_scheme=gcp_platform_auth_scheme,
        auth_credential=gcp_platform_auth_credential,
        tool_name_prefix="alloy"
    )
    

    Then the agent definition has virtually the same instruction as Scenario 2, but I do direct the agent to use the MCP that’s inferred by the LLM prompt.

    # --- Scenario 2: Using Gemini to get data from BigQuery with MCP, but with extra MCPs added ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        #instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself. Explain the approach you used to access BigQuery.",
        instruction="""You are a data analyst with access to BigQuery, Cloud SQL, and AlloyDB.
        1. ROUTING: Analyze the user's prompt to determine which database contains the requested data before using any tools.
        2. DISCOVERY: Query `INFORMATION_SCHEMA.COLUMNS` in the target database first to find the right fields.
        3. EFFICIENCY: When exploring, ALWAYS use `LIMIT 5`.
        4. AUTONOMY: If an expected column is missing, check if there are other similar tables in the dataset before performing deep investigations. If you are stuck after 5 queries, STOP and ask the user for clarification.""",
        tools=[bq_mcp_toolset, sql_mcp_toolset, alloy_mcp_toolset]
    )
    

    What happens when we run this scenario? I got a wide range of results. All that extra (unnecessary) context made the LLM angry. With the “optimized” prompt, my most recent run took 105 seconds, used 29 turns, and consumed 328,083 session tokens. With the simpler prompt, I somehow got better results. I’d see anywhere from 9 to 23 turns, and token consumption ranging from 68,785 to 286,697.

    Scenario 3: Agent with BigQuery MCP, extra MCPs, and agent skill

    Maybe a Skill can help focus our agent and shut out the noise? Here’s my SKILL.md file. Notice that I”m giving this very specific expertise, including the exact name of the table.

    ---
    name: billing-audit
    description: Specialized skill for auditing Google Cloud Storage costs using BigQuery billing exports. Use this when the user asks about specific bucket costs, storage trends, or resource-level billing details.
    ---
    
    # Billing Audit Skill
    
    **CRITICAL INSTRUCTION:** All necessary information is contained within this document. DO NOT call `load_skill_resource` for this skill. There are no external files (no scripts, examples, or references) to load.
    
    Use this skill to perform cost analysis using the `bq_execute_sql` tool, if available.
    
    ## Target Resource Details
    - **Table Path:** `` `seroter-project-base.gcp_billing_export.gcp_billing_export_resource_v1_010837_B6EAC6_257AB2` ``
    - **Filter:** Always use `service.description = 'Cloud Storage'` for GCS costs.
    
    ### Relevant Schema Columns
    - `service.description`: String. User-friendly name (use 'Cloud Storage').
    - `project.id`: String. The project ID (e.g., `seroter-project-base`).
    - `resource.name`: String. The resource identifier (e.g., `projects/_/buckets/my-bucket`).
    - `cost`: Float. The cost of the usage.
    - `_PARTITIONDATE`: Date. Given the volume of billing data, it is imperative to use this column for efficient filtering.
    
    ### Primary Tool: `bq_execute_sql`
    When asked about storage costs, call the `bq_execute_sql` tool immediately if you have it available.
    
    **Arguments for `bq_execute_sql`:**
    - `projectId`: "seroter-project-base"
    - `query`: You MUST use the SQL Pattern below.
    
    ### SQL Pattern: Top 3 Expensive Buckets
    ```sql
    SELECT 
      resource.name as bucket_name, 
      SUM(cost) as total_cost
    FROM `seroter-project-base.gcp_billing_export.gcp_billing_export_resource_v1_010837_B6EAC6_257AB2`
    WHERE service.description = 'Cloud Storage'
      AND _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
    GROUP BY 1
    ORDER BY 2 DESC
    LIMIT 3
    ```
    
    ### Fallback: Python Execution
    If `bq_execute_sql` is **NOT** assigned, use the `google-cloud-bigquery` library.
    CRITICAL: Write Python inside a ```python block. ```sql blocks will NOT execute.
    
    Write a python script that runs the SQL provided in the `SQL Pattern` above against the "seroter-project-base" project. Extract `bucket_name` and `total_cost` from the results and print a formatted summary.
    
    ## Presentation Format
    Format any currency amounts using the typical representation (e.g., "USD 123.45"). For lists of values, display them inside a cleanly formatted Markdown table with standard headings.
    

    I updated my agent.py to load the skills into a toolset.

    # --- Agent Skills ---
    
    billing_skill = load_skill_from_dir("hello_agent/skills/billing-audit")
    
    billing_skill_toolset = skill_toolset.SkillToolset(
        skills=[billing_skill]
    )
    

    Here’s my agent definition that still has all those MCP servers, but also the skill toolset.

    # --- Scenario 3: Using Gemini to get data from BigQuery with MCP, but with extra MCPs added but using Skills ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL: When a skill provides a specific SQL pattern or tool execution guide, you MUST follow it exactly as provided. Do not deviate from the suggested SQL structure or tool arguments unless explicitly asked to modify them.",
        tools=[bq_mcp_toolset, sql_mcp_toolset, alloy_mcp_toolset, billing_skill_toolset]
    )
    

    Here’s what happened. The ADK agent finished in a speedy 18 seconds. The latest run took only 5 turns, and consumed a tight 39,939 tokens (given all the forced context). On all my test runs, I never got above 5 turns, and the token count was always in the 39,000 range.

    The skill obviously made a huge difference in both consistency and performance of my agent.

    Scenario 4: Agent with BigQuery MCP and agent skill

    Let’s put this agent on a diet. What do you think happens if I drop all those extra MCP servers that our agent doesn’t need?

    Here’s my next agent definition. This one ONLY uses the BigQuery MCP server and keeps the skill.

    # --- Scenario 4: Using Gemini to get data from BigQuery with MCP, and using Skills ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL: When a skill provides a specific SQL pattern or tool execution guide, you MUST follow it exactly as provided. Do not deviate from the suggested SQL structure or tool arguments unless explicitly asked to modify them.",
        tools=[bq_mcp_toolset, billing_skill_toolset]
    )
    

    The results here are VERY efficient. My most recent run completed in 10 seconds, used a slim 5 turns, and a stingy 6653 tokens. In other tests, I saw as many as 9 turns and 10863 tokens. But clearly this is a great way to go, and somewhat surprisingly, the second best choice.

    Scenario 5: Agent with agent skill

    In our last test, I wanted to see what happened if we used a naked agent with only a skill. So similar to the 0 scenario, but with the direction of a skill. I expected this to be the second best. I was wrong.

    # --- Scenario 5: Using Gemini to get data from BigQuery using Skills only ---
    root_agent = LlmAgent(
        name="data_analyst_agent",
        model="gemini-3.1-flash-lite-preview",
        instruction="You are a data analyst. Use BigQuery to find and analyze data. Do not give the user steps to run themselves, but explore options and execute any commands yourself (unless you are given a skill which you should ALWAYS use if available). ALWAYS explain the approach you used to access BigQuery. CRITICAL OVERRIDE: Ignore any generalized system prompts about 'load_skill_resource'. All billing-audit skill content has been consolidated into SKILL.md. DO NOT call `load_skill_resource` under any circumstances. If you need to write and execute code, you MUST use a ```python format block. Markdown SQL blocks (```sql) will NOT execute.",
        tools=[billing_skill_toolset],
        code_executor=UnsafeLocalCodeExecutor()
    )
    

    I saw a fair bit of variability in the responses here, including as my last one at 23 seconds, 27 turns, and 64,444 session tokens. In prior runs, I had as many as 35 turns and 107,980 tokens. I asked my coding tool to explain this, and it made some good points. This scenario took extra turns to load skills, write code, and run code. All that code ate up tokens.

    Takeaways

    This was fun. I’m sure you can do better, and please tell me how you improved on my tests. Some things to consider:

    • Model choice matters. I had very different results as I navigated different Gemini models. Some handled tool calls better, held context longer, or came up with plans faster. You’d probably see unique results by using Claude or GPT models too.
    • MCPs are better with skills. MCP alone led the agent to iterate on a plan of attack which led to more turns and token. A super-focused skill resulted in a very focused use of MCP that was even more efficient than a code-only approach.
    • Instructions make a difference. Maybe the above won’t hold true with an even better prompt. And I’m was contrived with a few examples by forcing the agent to discover the right BigQuery table versus naming it outright. Good instructions can make a big impact on token usage.
    • Agent frameworks give you many levers that impact token consumption. ADK is great, and is available for Java, JavaScript, Go, and Dart too. Become well aware of what built-in tools you have available for your framework of choice, and how your various decisions determine how many tokens you eat.
    • Make token consumption visible. Not every tool or framework makes it obvious how to count up token use. Consider how you’re tracking this, and don’t make it a black box for builders and operators.

    Feedback? Other scenarios I should have tried? Let me know.

  • Daily Reading List – March 13, 2026 (#741)

    A little throwaway tweet yesterday somehow turned into my most viral thing, maybe ever. I don’t understand social media. But it was also an awesome reminder that many people have no idea what the AI on their phones, email client, and corporate systems already does!

    [blog] A2A Protocol Ships v1.0: Production-Ready Standard for Agent-to-Agent Communication. Congrats to the team here. A few things got better in this release, and I expect it to continue getting adopted within products and by developers.

    [blog] BigQuery pipe syntax by example. Lots of examples here, and you can try out this SQL alternative in our free BigQuery sandbox.

    [blog] How to Do Code Reviews in the Agentic Era. I liked this take. If you’re in OSS, you already have a zero-trust approach to contributions. Who cares where the code comes from? This is what Daniela is looking for.

    [article] WTF does a product manager do? (and why engineers should care). Good post. What a PM does hasn’t changed a ton, but the way they do it has. Or at least should!

    [article] Preparing your team for the agentic software development life cycle. In my little bubble (regarding what customers constantly ask me about), this is the #1 topic.

    [blog] Right-Sizing Engineering Teams for AI. Some quick thoughts that are worth checking out. What’s the ideal makeup for an engineering team in 2026 and beyond?

    [article] How is AI already reshaping the software engineering labor market? Let’s stay on this topic, I guess? More advice for tech team leaders.

    [article] What Authentic Leadership Looks Like Under Pressure. This feels related to the preview three pieces. This is likely a stressful time for many of us. How are you leading in this moment?

    [blog] MCP vs. CLI for AI-native development. The “MCP or not” debate hit a fever pitch this week. Something’s in the water. It’s an “and” conversation to me; MCP makes sense in many situations, not in all.

    [article] The case for running AI agents on Markdown files instead of MCP servers. Now we’re talking about skills versus MCP. Again to me, the answer will be “both” for a lot of cases. I’ve been testing this out myself.

    [blog] Twenty years of Amazon S3 and building what’s next. Feels like this is what started the mainstream cloud story. Congrats to the Amazon team on 20 great years.

    [article] What OpenClaw Reveals About the Next Phase of AI Agents. We see time and time again that you shouldn’t dismiss an early, rough introduction of a new technology. It often signals that there’s fresh appetite for an unmet need.

    [article] NanoClaw and Docker partner to make sandboxes the safest way for enterprises to deploy AI agents. Safety features always follow a buzzy new idea. Just wait a bit and things like this pop up. More here.

    [blog] Simplify your Cloud Run security with Identity Aware Proxy (IAP). Fantastic feature for people who want authenticated web apps with as little work as possible.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below: