Author: Richard Seroter

  • Daily Reading List – May 10, 2024 (#316)

    It’s the end of a solid workweek. Next week, I’ll be at Google I/O delivering the cloud keynote, and then jetting off to Vegas to talk to analysts at the Gartner event going on there. Those events have wildly different audiences, and I deeply enjoy both.

    [article] Gatekeepers Limit Continuous Delivery’s Benefits. The theory of constraints says you have to optimize the limiting factor. If you improve how fast folks can write code, but can’t ship it any faster, what’s the point?

    [blog] Building AI-Powered Apps with LangChain: A 2024 Guide. Good details about this framework that has quickly become a go-to for folks building generative AI apps.

    [blog] AlloyDB vs. self-managed PostgreSQL: a price-performance comparison. There are, of course, cases where self-managing hardware or software is a better move for you. But the number of cases is shrinking.

    [blog] Paramount+: A streaming powerhouse with limitless entertainment. Few folks need a zero-downtime, globally available service. But when you do, or if you want to apply the lessons learned from such an approach, see this post.

    [blog] 6 Best Practices for Hosting Developer-Focused Events. The in-person meetup scene seems hot right now, with lots of tech folks getting together to learn. This is a good post for those putting on events.

    [blog] Kubernetes 1.30 is now available in GKE in record time. Staying up to date in your Kubernetes cluster isn’t about features; it’s about security and stability updates. Nobody keeps you up to date like our GKE team.

    [blog] What’s new with Active Assist: New Hub UI and four new recommendations. Even if the AI hype train has taken focus off cost and security optimization, those things still matter a ton. I like the new security and audit-related “recommendations” we’re offering Cloud customers now.

    [blog] The DevRel Guide to Business Jargon. Get definitions of common sales and marketing terms like TAM, ARR, conversion rate, and more.

    [blog] How chaos testing adds extra reliability to Spanner’s fault-tolerant design. I was impressed by the extent to which we try to “break” Spanner to test its reliability. You might find inspiration for tests you want to run against your own systems.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – May 9, 2024 (#315)

    Each day, I just read whatever pops into my feeds and newsletters. I’m not looking for a theme, but sometimes one pops out at me. Today? It seemed like a lot of content focused on optimization and doing things the right way. For example, check out items below about improving dev experience, efficient hosting of streaming platforms, doing CI well, controlling ops metrics volume, and scaling Kubernetes.

    [article] Boost Developer Productivity by Reducing Their ‘Paper Cuts’. If you’re a small or mid-size tech team, what can you do to build a better dev experience for your team? This article offers an outstanding blueprint.

    [article] 20 Lessons From 20 Different Paths to Product-Market Fit — Advice for Founders, From Founders. This is useful for founders, yes, but also for anyone building products that aim to solve a problem.

    [blog] Three Laws of Software Complexity (or: why software engineers are always grumpy). Do well-designed systems always degrade into badly designed ones? Mahesh makes that argument, and others.

    [paper] Capabilities of Gemini Models in Medicine. There’s 30+ pages of description and data in this new paper, and it may inspire you for use cases outside of medicine.

    [blog] Yahoo Benchmarks Dataflow vs Self-managed Flink Efficiency for two Streaming use-cases– Which is More Cost-Effective? Some Yahoo! engineers wanted to compare the cost and performance of running their own data processing stack (on Kubernetes) or using a managed service.

    [article] How is Flutter Platform-Agnostic? This framework renders interfaces across desktop, web, and mobile. how does it do that? Good deep dive here.

    [article] Generative AI interest now shapes talent strategy, employers say. Makes sense. This likely impacts who you hire, how you train, and what you build.

    [blog] Optimizing CI in Google Cloud Build. Darren wrote a fantastic post that’s helpful whether you’re using the Google Cloud services he mentions, or not.

    [article] Microsoft and OpenAI’s increasingly complicated relationship. Partnerships are critical, and Microsoft does them well overall. But I’m glad Google has its own foundation models to bet on. A related perspective here.

    [blog] Controlling metric ingestion with Google Cloud Managed Service for Prometheus. If you’re drowning in metrics, and paying a lot for the privilege, consider some of the steps called out here.

    [blog] The Streaming War Is Over and All It Cost Was the Entertainment Industry. Bundles versus individual best-of-breed is a pendulum. For streaming, we’re inevitably back at bundles.

    [blog] The surprising economics of Horizontal Pod Autoscaling tuning. Some good advice here on autoscaling, with a clear view of the tradeoffs of setting different resource targets.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – May 8, 2024 (#314)

    Weirdly, many of the things I read today seemed to relate to other things I read today. I’ll chalk it up to a weird wrinkle in the universe.

    [blog] The Ultimate Guide to building Developer Tool Websites. This post justifies the title of “ultimate guide.” It’s a really good look at how to build websites that developers find useful.

    [blog] A tour of Gemini 1.5 Pro samples. Speaking of useful things offered on dev websites, Mete looks at code samples here. He looks at how to use Gemini to process audio, video, and multi-modalities at the same time.

    [blog] Humility: The Secret Ingredient to Modern Tech Agility. We don’t celebrate humility (which I guess would be ironic in some way), but it’s a key ingredient to agile leaders and teams.

    [blog] AlphaFold 3 predicts the structure and interactions of all of life’s molecules. Talk about humbling! We’re now learning more about the biological world than ever before.

    [[blog] The biggest effect on code quality. This author says its not about tools or skills; code quality is impacted by teams working in “crunch mode” and making mistakes.

    [blog] What causes new engineers to “sink or swim”? If you want new developers on your team to be successful, check out this guidance for what determines that.

    [article] What Can Incident Teams Learn From Crisis Management? My friend Dormain wrote up an insightful piece on the need for strong communication during incidents.

    [blog] Maintain business continuity across regions with BigQuery managed disaster recovery. I just mentioned incident management, and here’s a piece on a new feature of BigQuery that offers automatic failover in the unlikely case of a regional outage.

    [blog] Simplifying and standardizing software at scale. Here’s how the McDonalds engineering team is building golden paths to help teams build, ship, and run software better.

    [blog] What is Istio? The Kubernetes service mesh explained. I don’t know if Istio helps you simplify things, but it’s definitely a powerful mesh that’s getting easier to use.

    [blog] Observability, Telemetry, and Monitoring: Learn About the Differences. Stop using these words interchangeably! I’m mostly saying that to myself. But also, you might find this clarifying.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – May 7, 2024 (#313)

    Today’s reading list had some good practical examples, and also some thought-provoking ideas. Did any resonate with you too?

    [blog] Moneyball with GenAI: Using Vertex AI Search to Find the Next Generation of Baseball Stars. If you want me to read and share your post, just make it about baseball. Well done, Alok. This is an excellent demonstration of how AI can change how we interact with unstructured data.

    [blog] 5 ways Service Extensions callouts can improve your Cloud Load Balancing environment. This product update didn’t catch my eye, but a Google Engineering Fellow told me I should give it a second look. Thanks, Anna. Yes, it is a very powerful way to add intelligence and security as part of cloud traffic management. Also mentioned here as part of a cross-cloud strategy.

    [article] Enterprises prep for big AI spending, but data woes prevent progress. It looks like most companies are recognizing the importance of a solid data strategy.

    [blog] alpine, distroless or scratch? A small, minimum-dependency base image for your containers is a smart bet. This post looks at 4 options, and the implications of each.

    [article] What software developers hate. Short post, but Matt looks at three things—scope creep, pace of learning, lack of time to code—that drive developers bonkers.

    [article] Platform Engineering for a Mainframe: Design Thinking Drives Change. You can apply modern practices (with varying levels of impact) to nearly any technology stack. I like efforts like this which try to help users of “legacy” systems get a better dev experience.

    [article] 4 Common Types of Team Conflict — and How to Resolve Them. Conflict, even temporary, is inevitable in a team. This article identifies each type, and how to respond.

    [blog] Build and Deploy a Langchain App With a Vector Database. Wietse wrote a very good walkthrough of the steps for building a generative AI app.

    [blog] Does Your Marketing Pass the Duck Test? This is a good read, especially for folks who crave plain-English descriptions that actually tell you something.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – May 6, 2024 (#312)

    It was a beautiful weekend down here in San Diego. Summer feels close. For those readers in the Northern Hemisphere, hopefully you’re feeling the same!

    [blog] Advancing the art of AI-driven security with Google Cloud. The RSA conference is this week, so expect lots of security news across the industry. We announced some new services, like Google Threat Intelligence and Google Security Operations.

    [blog] A Useful Productivity Measure? What did this VP of Engineering do after being asked to define a productivity metric for his team? I like where he ended up.

    [blog] How LLMs Work, Explained Without Math. Unless you live and breathe LLMs, you’ll probably have one or two lightbulb moments when reading this. I sure did.

    [article] MongoDB takes data streaming service GA. Kafka integrations today, apparently more coming in the future. Check this out if you use MongoDB yourself.

    [article] Dealing With Chaos: A Guide for Leaders Feeling Overwhelmed at Work. This likely resonates with everyone at one point or another. It’s useful to label some of the circumstances that cause a sense of chaos.

    [blog] Codelab: Using Gemini Code Assist to explore and enhance Generative AI Document Summarization Jump Start Solution. If you’re like me, you enjoy learning from observing small bits of a bigger solution while ALSO exploring more complete solutions themselves. Here’s an end to end AI-based scenario to dig into.

    [article] Friday Forward – Chasing Butterflies. Good post from Bob. Sometimes feelings are just feelings, and not indicative of something that needs therapy or medication.

    [article] 10 principles for creating a great developer experience. This is a good list to work through if you’re serious about building and maintaining an impactful developer experience.

    [blog] When to use Gemini or purpose-built AI models in BigQuery. This argument applies to almost everything in tech: use the general purpose thing or the purpose-built thing? Shane looks at how BigQuery handles some specific data prep and analysis tasks with AI models.

    [blog] 60 to 100 days to onboard a developer – Highlights from the Harness State of Developer Experience survey. Lots of (good) thoughts here about making a better developer experience and healthier platform.

    [blog] Accelerating incident response using generative AI. Here are some details from an experiment run by the Google Security team.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – May 3, 2024 (#311)

    I spent most of today allowing the “unread” number in my inbox to grow as I attended meetings that required my full attention. For an “inbox zero” person, that’s painful. But fortunately the last half hour of the day included some productive catch-up. Enjoy your weekend!

    [blog] BigQuery Vector Search using Python SDK, Gemini and Langchain on GCP. Good walkthrough of using a public dataset to experiment with creating and storing embeddings before running the queries that use them.

    [blog] Intuit Runs Gameday Simulations to Test Resilience of Critical Business Systems and Apps at Scale. Do you run these types of exercises to test out your incident response process, runbooks, and system resilience?

    [blog] Secure Randomness in Go 1.22. Geeky read, but a good one if you care about secure code.

    [blog] Private networking patterns to Vertex AI workloads. Your best option for performant, innovative, and complete AI stacks is in the cloud. But what are the right patterns for securely interfacing with those services? This post is about networking.

    [article] How Slack automates deploys. Good post that looks at some recent work Slack did to improve dev experience and how they do release management.

    [blog] Dear Satya Nadella: Why Are You Whitewashing the Microsoft China Cybersecurity Crisis? Security has been an issue at Microsoft for a bit. Maybe Satya’s serious about it now?

    [blog] Introducing Honeycomb for Frontend Observability: Get the Data You Need for Actionable Customer Experience Improvements. If you’re going to explore unknown unknowns, you need access to all sorts of places to explore.

    [blog] Simulate a zone failure in GKE regional clusters. You know what shouldn’t be unknown? How your app responds when it loses a cloud zone. This is a good guide for how to simulate that scenario.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – May 2, 2024 (#310)

    It was a wild, frantic day, and now I’m sitting in the airport waiting to fly home. My days are never boring, which is a gift! A little more boring might be nice.

    [blog] Thin Events: The lean muscle of event-driven architecture. Gosh, I remember writing about this topic fifteen years ago. Even today, it’s important to make conscious choices about how you shape your messages—”thick” messages that contain all the data, or “thin” messages that are only notifications.

    [article] RecurrentGemma: An Open Language Model For Smaller Devices. Transformers are all the rage for deep learning models, but the Gemma team used a different approach with this model. And got good results.

    [article] Atlassian launches Rovo, its new AI teammate. While I’m not smart enough to understand why every vendor needs to create their own foundation model, I do get why everyone is offering agent-builder experiences.

    [blog] C# and Vertex AI Gemini streaming API bug and workaround. We know that our interactions with APIs is sometimes affected by choices within our programming language itself. This issues within C# is causing an issue calling the Gemini API, but a fix is on the way.

    [blog] The Backend for Frontend Pattern. If you’re doing single-page apps, or just looking for more ideas around authentication in distributed systems, check out this post.

    [blog] Evolving the Go Standard Library with math/rand/v2. Developers often use random number generators in their code. This is a big, interesting post on the new generator in Go, and how we introduced a breaking change.

    [blog] Google is a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud AI Developer Services. Any analyst assessment related to AI is outdated the moment it ships, but there are definitely fundamental criteria that are durable. I’m glad to see us show up well here.

    [blog] Upbound now everywhere: A fully automated Crossplane experience for platform engineers. Crossplane is a cool CNCF project for managing infrastructure across clouds, and the sponsoring company now has a managed service for you.

    [blog] Managing Cloud Storage soft delete at scale. Everyone makes mistakes, and it’s good to be able to recover “deleted” files if needed. I was impressed reading this post and seeing the breadth we considered (metrics, Terraform, APIs) to manage this new type of object.

    [article] WebAssembly Adoption: It’s Complicated, Says CNCF Survey. I can’t say I’ve ever had Wasm come up in a customer conversation during the past four years. Maybe it’s who I talk to. Or it’s just not that relevant to most developers.

    [blog] KubeCrash Platform Engineering Recap: From Silo Busting to Thinnest Viable Platforms. You can use tools like Gemini to summarize videos, but I still like when humans do it. This post has some takeaways about platform engineering from a recent event.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – May 1, 2024 (#309)

    I had a very enlightening day of leadership meetings and am very bullish on the things Google Cloud is working on for customers. Energy level is high around here! I started my day with lots of reading, which you’ll find below.

    [article] Retail banking turns to core modernization as cloud strategies mature. It’s time to finish some of those modernizations, people! These small, incremental approaches take too long, and it’s holding up future progress.

    [blog] GCP Data Engineering Project: Streaming Data Pipeline with Pub/Sub and Apache Beam/Dataflow. Big post showing off a complete solution for aggregating messages in a streaming architecture.

    [article] MongoDB aims to jumpstart AI app development with MAAP. More tools and professional services on the way to help folks build generative AI apps.

    [blog] How Konfig provides an enterprise platform with GitLab and Google Cloud. I’d suspect that if you’re reading my post, you probably have a source control system in place. But if not, or if you need an upgrade, I like what Real Kinetic is doing to make it easier to set up an enterprise-grade deployment.

    [article] AI still has a ways to go in code refactoring. Readability and maintainability matter as much (more than?) coding speed, and Matt points out the need to supervise what AI is generating.

    [blog] Supercharged Developer Portals. Good for Spotify for commercializing their open source tech and making developer portals easier to set up and use.

    [article] AI, Your Task: Create Autonomous Agents. Vik (with help from AI) wrote this piece about “foundation agents” that learn and adapt to their environments.

    [article] Who Takes a Risk on New Technology? That new technology won’t take off if there aren’t people willing to make personal bets on it. This article starts with a story about directors in Hollywood, and connects it to technology adoption.

    [article] Java 17 is most-used LTS version of Java – report. It’s very good to see that Java 8 is finally falling out favor with Java devs.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – April 30, 2024 (#308)

    Travel day as I jetted up to Sunnyvale for a week of meetings. The year is 1/3 over after tomorrow, so hopefully you’re tracking towards some of your 2024 goals!

    [article] DevEx Success: How Pfizer Scaled to 1,000 Engineers. Building a great developer experience for an individual team is wonderful, and not trivial. Scaling that to hundreds or thousands? Entirely different problem. This article has good advice.

    [article] The IBM-HashiCorp coupling could be more complicated than it seems. This acquisition definitely got folks talking. Here’s another perspective from TechCrunch. And Forrest wrote a thing too.

    [blog] Level Up your RAG: Tuning Embeddings on Vertex AI. Can you bubble up the most relevant answers from an LLM through tuning? Ivan has a deep dive here.

    [blog] Long document summarization with Workflows and Gemini models. Map/reduce for big documents you’re trying to summarize with generative AI? I like this pattern.

    [article] Is DevOps just a conspiracy theory? A bit of a tongue in cheek post, but related to an important question about what DevOps is really about.

    [blog] From Assistant to Analyst: The Power of Gemini 1.5 Pro for Malware Analysis. Does the giant context window of the Gemini model help with reverse engineering malware? Looks like it.

    [article] Google’s Gemini beats GPT-4, Llama 2 in model grading test. The “best” is a fast-changing label, but I’m glad Gemini is currently doing well compared to peers.

    ##

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Here’s what I’d use to build a generative AI application in 2024

    Here’s what I’d use to build a generative AI application in 2024

    What exactly is a “generative AI app”? Do you think of chatbots, image creation tools, or music makers? What about document analysis services, text summarization capabilities, or widgets that “fix” your writing? These all seem to apply in one way or another! I see a lot written about tools and techniques for training, fine-tuning, and serving models, but what about us app builders? How do we actually build generative AI apps without obsessing over the models? Here’s what I’d consider using in 2024. And note that there’s much more to cover besides just building—think designing, testing, deploying, operating—but I’m just focusing on the builder tools today.

    Find a sandbox for experimenting with prompts

    A successful generative AI app depends on a useful model, good data, and quality prompts. Before going to deep on the app itself, it’s good to have a sandbox to play in.

    You can definitely start with chat tools like Gemini and ChatGPT. That’s not a bad way to get your hands dirty. There’s also a set of developer-centric surfaces such as Google Colab or Google AI Studio. Once you sign in with a Google ID, you get free access to environments to experiment.

    Let’s look at Google AI Studio. Once you’re in this UI, you have the ability to simulate a back-and-forth chat, create freeform prompts that include uploaded media, or even structured prompts for more complex interactions.

    If you find yourself staring at an empty console wondering what to try, check out this prompt gallery that shows off a lot of unique scenarios.

    Once you’re doing more “serious” work, you might upgrade to a proper cloud service that offers a sandbox along with SLAs and prompt lifecycle capabilities. Google Cloud Vertex AI is one example. Here, I created a named prompt.

    With my language prompts, I can also jump into a nice “compare” experience where I can try out variations of my prompt and see if the results are graded as better or worse. I can even set one as “ground truth” used as a baseline for all comparisons.

    Whatever sandbox tools you use, make sure they help you iterate quickly, while also matching the enterprise-y needs of the use case or company you work for.

    Consume native APIs when working with specific models or platforms

    At this point, you might be ready to start building your generative AI app. There seems to be a new, interesting foundation model up on Hugging Face every couple of days. You might have a lot of affection for a specific model family, or not. If you care about the model, you might choose the APIs for that specific model or provider.

    For example, let’s say you were making good choices and anchored your app to the Gemini model. I’d go straight to the Vertex AI SDK for Python, Node, Java, or Go. I might even jump to the raw REST API and build my app with that.

    If I were baking a chat-like API call into my Node.js app, the quickest way to get the code I need is to go into Vertex AI, create a sample prompt, and click the “get code” button.

    I took that code, ran it in a Cloud Shell instance, and it worked perfectly. I could easily tweak it for my specific needs from here. Drop this code into a serverless function, Kubernetes pod, or VM and you’ve got a working generative AI app.

    You could follow this same direct API approach when building out more sophisticated retrieval augmented generation (RAG) apps. In a Google Cloud world, you might use the Vertex AI APIs to get text embeddings. Or you could choose something more general purpose and interact with a PostgreSQL database to generate, store, and query embeddings. This is an excellent example of this approach.

    If you have a specific model preference, you might choose to use the API for Gemini, Llama, Mistral, or whatever. And you might choose to directly interact with database or function APIs to augment the input to those models. That’s cool, and is the right choice for many scenarios.

    Use meta-frameworks for consistent experiences across models and providers

    As expected, the AI builder space is now full of higher-order frameworks that help developers incorporate generative AI into their apps. These frameworks help you call LLMs, work with embeddings and vector databases, and even support actions like function calling.

    LangChain is a big one. You don’t need to be bothered with many model details, and you can chain together tasks to get results. It’s for Python devs, so your choice is either to use Python, or, embrace one of the many offshoots. There’s LangChain4J for Java devs, LangChain Go for Go devs, and LangChain.js for JavaScript devs.

    You have other choices if LangChain-style frameworks aren’t your jam. There’s Spring AI, which has a fairly straightforward set of objects and methods for interacting with models. I tried it out for interacting with the Gemini model, and almost found it easier to use than our native API! It takes one update to my POM file:

    <dependency>
    			<groupId>org.springframework.ai</groupId>
    			<artifactId>spring-ai-vertex-ai-gemini-spring-boot-starter</artifactId>
    </dependency>
    

    One set of application properties:

    spring.application.name=demo
    spring.ai.vertex.ai.gemini.projectId=seroter-dev
    spring.ai.vertex.ai.gemini.location=us-central1
    spring.ai.vertex.ai.gemini.chat.options.model=gemini-pro-vision
    

    And then an autowired chat object that I call from anywhere, like in this REST endpoint.

    @RestController
    @SpringBootApplication
    public class DemoApplication {
    
    	public static void main(String[] args) {
    		SpringApplication.run(DemoApplication.class, args);
    	}
    
    	private final VertexAiGeminiChatClient chatClient;
    
    	@Autowired
        public DemoApplication(VertexAiGeminiChatClient chatClient) {
            this.chatClient = chatClient;
        }
    
    	@GetMapping("/")
    	public String getGeneratedText() {
    		String generatedResponse = chatClient.call("Tell me a joke");
    		return generatedResponse;
    	}
    }
    

    Super easy. There are other frameworks too. Use something like AI.JSX for building JavaScript apps and components. BotSharp is a framework for .NET devs building conversational apps with LLMs. Hugging Face has frameworks that help you abstract the LLM, including Transformers.js and agents.js.

    There’s no shortage of these types of frameworks. If you’re iterating through LLMs and want consistent code regardless of which model you use, these are good choices.

    Create with low-code tools when available

    If I had an idea for a generative AI app, I’d want to figure out how much I actually had to build myself. There are a LOT of tools for building entire apps, components, or widgets, and many require very little coding.

    Everyone’s in this game. Zapier has some cool integration flows. Gradio lets you expose models and APIs as web pages. Langflow got snapped up by DataStax, but still offers a way to create AI apps without much required coding. Flowise offers some nice tooling for orchestration or AI agents. Microsoft’s Power Platform is useful for low-code AI app builders. AWS is in the game now with Amazon Bedrock Agents. ServiceNow is baking generative AI into their builder tools, Salesforce is doing their thing, and basically every traditional low-code app vendor is playing along. See OutSystems, Mendix, and everyone else.

    As you would imagine, Google does a fair bit here as well. The Vertex AI Agent Builder offers four different app types that you basically build through point-and-click. These include personalized search engines, chat, recommendation engine, and connected agents.

    Search apps can tap into a variety of data sources including crawled websites, data warehouses, relational databases, and more.

    What’s fairly new is the “agent app” so let’s try building one of those. Specifically, let’s say I run a baseball clinic (sigh, someday) and help people tune their swing in our batting cages. I might want a chat experience for those looking for help with swing mechanics, and then also offer the ability to book time in the batting cage. I need data, but also interactivity.

    Before building the AI app, I need a Cloud Function that returns available times for the batting cage.

    This Node.js function returns an array of book-able timeslots. I’ve hard-coded the data, but you get the idea.

    I also jumped into the Google Cloud IAM interface to ensure that the Dialogflow service account (which the AI agent operates as) has permission to invoke the serverless function.

    Let’s build the agent. Back in the Vertex AI Agent Builder interface, I choose “new app” and pick “agent.”

    Now I’m dropped into the agent builder interface. On the left, I have navigation for agents, tools, test cases, and more. In the next column, I set the goal of the agent, the instructions, and any tools I want to use with the agent. On the right, I preview my agent.

    I set a goal of “Answer questions about baseball and let people book time in the batting cage” and then get to the instructions. There’s a “sample” set of instructions that are useful for getting started. I used those, but removed references to other agents or tools, as we don’t have that yet.

    But now I want to add a tool, as I need a way to show available booking times if the user asks. I have a choice of adding a data store—this is useful if you want to source Q&A from a BigQuery table, crawl a website, or get data from an API. I clicked the “manage all tools” button and chose to add a new tool. Here I give the tool a name, and very importantly, a description. This description is used by the AI agent to figure out when to invoke it.

    Because I chose OpenAPI as the tool type, I need to provide an OpenAPI spec for my Cloud Function. There’s a sample provided, and I used that to put together my spec. Note that the URL is the function’s base URL, and the path contains the specific function name.

    {
        "openapi": "3.0.0",
        "info": {
            "title": "Cage API",
            "version": "1.0.0"
        },
        "servers": [
            {
                "url": "https://us-central1-seroter-anthos.cloudfunctions.net"
            }
        ],
        "paths": {
            "/function-get-cage-times": {
                "get": {
                    "summary": "List all open cage times",
                    "operationId": "getCageTimes",
                    "responses": {
                        "200": {
                            "description": "An array of cage times",
                            "content": {
                                "application/json": {
                                    "schema": {
                                        "type": "array",
                                        "items": {
                                            "$ref": "#/components/schemas/CageTimes"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "components": {
            "schemas": {
                "CageTimes": {
                    "type": "object",
                    "required": [
                        "cageNumber",
                        "openSlot",
                        "cageType"
                    ],
                    "properties": {
                        "cageNumber": {
                            "type": "integer",
                            "format": "int64"
                        },
                        "openSlot": {
                            "type": "string"
                        },
                        "cageType": {
                            "type": "string"
                        }
                    }
                }
            }
        }
    }
    

    Finally, in this “tool setup” I define the authentication to that API. I chose “service agent token” and because I’m calling a specific instance of a service (versus the platform APIs), I picked “ID token.”

    After saving the tool, I go back to the agent definition and want to update the instructions to invoke the tool. I use the syntax, and appreciated the auto-completion help.

    Let’s see if it works. I went to the right-hand preview pane and asked it a generic baseball question. Good. Then I asked it for open times in the batting cage. Look at that! It didn’t just return a blob of JSON; it parsed the result and worded it well.

    Very cool. There are some quirks with this tool, but it’s early, and I like where it’s going. This was MUCH simpler than me building a RAG-style or function-calling solution by hand.

    Summary

    The AI assistance and model building products get a lot of attention, but some of the most interesting work is happening in the tools for AI app builders. Whether you’re experimenting with prompts, coding up a solution, or assembling an app out of pre-built components, it’s a fun time to be developer. What products, tools, or frameworks did I miss from my assessment?