Travel day as I jetted up to Sunnyvale for a week of meetings. The year is 1/3 over after tomorrow, so hopefully you’re tracking towards some of your 2024 goals!
[article] DevEx Success: How Pfizer Scaled to 1,000 Engineers. Building a great developer experience for an individual team is wonderful, and not trivial. Scaling that to hundreds or thousands? Entirely different problem. This article has good advice.
What exactly is a “generative AI app”? Do you think of chatbots, image creation tools, or music makers? What about document analysis services, text summarization capabilities, or widgets that “fix” your writing? These all seem to apply in one way or another! I see a lot written about tools and techniques for training, fine-tuning, and serving models, but what about us app builders? How do we actually build generative AI apps without obsessing over the models? Here’s what I’d consider using in 2024. And note that there’s much more to cover besides just building—think designing, testing, deploying, operating—but I’m just focusing on the builder tools today.
Find a sandbox for experimenting with prompts
A successful generative AI app depends on a useful model, good data, and quality prompts. Before going to deep on the app itself, it’s good to have a sandbox to play in.
You can definitely start with chat tools like Gemini and ChatGPT. That’s not a bad way to get your hands dirty. There’s also a set of developer-centric surfaces such as Google Colab or Google AI Studio. Once you sign in with a Google ID, you get free access to environments to experiment.
Let’s look at Google AI Studio. Once you’re in this UI, you have the ability to simulate a back-and-forth chat, create freeform prompts that include uploaded media, or even structured prompts for more complex interactions.
If you find yourself staring at an empty console wondering what to try, check out this prompt gallery that shows off a lot of unique scenarios.
Once you’re doing more “serious” work, you might upgrade to a proper cloud service that offers a sandbox along with SLAs and prompt lifecycle capabilities. Google Cloud Vertex AI is one example. Here, I created a named prompt.
With my language prompts, I can also jump into a nice “compare” experience where I can try out variations of my prompt and see if the results are graded as better or worse. I can even set one as “ground truth” used as a baseline for all comparisons.
Whatever sandbox tools you use, make sure they help you iterate quickly, while also matching the enterprise-y needs of the use case or company you work for.
Consume native APIs when working with specific models or platforms
At this point, you might be ready to start building your generative AI app. There seems to be a new, interesting foundation model up on Hugging Face every couple of days. You might have a lot of affection for a specific model family, or not. If you care about the model, you might choose the APIs for that specific model or provider.
For example, let’s say you were making good choices and anchored your app to the Gemini model. I’d go straight to the Vertex AI SDK for Python, Node, Java, or Go. I might even jump to the raw REST API and build my app with that.
If I were baking a chat-like API call into my Node.js app, the quickest way to get the code I need is to go into Vertex AI, create a sample prompt, and click the “get code” button.
I took that code, ran it in a Cloud Shell instance, and it worked perfectly. I could easily tweak it for my specific needs from here. Drop this code into a serverless function, Kubernetes pod, or VM and you’ve got a working generative AI app.
If you have a specific model preference, you might choose to use the API for Gemini, Llama, Mistral, or whatever. And you might choose to directly interact with database or function APIs to augment the input to those models. That’s cool, and is the right choice for many scenarios.
Use meta-frameworks for consistent experiences across models and providers
As expected, the AI builder space is now full of higher-order frameworks that help developers incorporate generative AI into their apps. These frameworks help you call LLMs, work with embeddings and vector databases, and even support actions like function calling.
LangChain is a big one. You don’t need to be bothered with many model details, and you can chain together tasks to get results. It’s for Python devs, so your choice is either to use Python, or, embrace one of the many offshoots. There’s LangChain4J for Java devs, LangChain Go for Go devs, and LangChain.js for JavaScript devs.
You have other choices if LangChain-style frameworks aren’t your jam. There’s Spring AI, which has a fairly straightforward set of objects and methods for interacting with models. I tried it out for interacting with the Gemini model, and almost found it easier to use than our native API! It takes one update to my POM file:
And then an autowired chat object that I call from anywhere, like in this REST endpoint.
@RestController
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
private final VertexAiGeminiChatClient chatClient;
@Autowired
public DemoApplication(VertexAiGeminiChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/")
public String getGeneratedText() {
String generatedResponse = chatClient.call("Tell me a joke");
return generatedResponse;
}
}
Super easy. There are other frameworks too. Use something like AI.JSX for building JavaScript apps and components. BotSharp is a framework for .NET devs building conversational apps with LLMs. Hugging Face has frameworks that help you abstract the LLM, including Transformers.js and agents.js.
There’s no shortage of these types of frameworks. If you’re iterating through LLMs and want consistent code regardless of which model you use, these are good choices.
Create with low-code tools when available
If I had an idea for a generative AI app, I’d want to figure out how much I actually had to build myself. There are a LOT of tools for building entire apps, components, or widgets, and many require very little coding.
Everyone’s in this game. Zapier has some cool integration flows. Gradio lets you expose models and APIs as web pages. Langflow got snapped up by DataStax, but still offers a way to create AI apps without much required coding. Flowise offers some nice tooling for orchestration or AI agents. Microsoft’s Power Platform is useful for low-code AI app builders. AWS is in the game now with Amazon Bedrock Agents. ServiceNow is baking generative AI into their builder tools, Salesforce is doing their thing, and basically every traditional low-code app vendor is playing along. See OutSystems, Mendix, and everyone else.
As you would imagine, Google does a fair bit here as well. The Vertex AI Agent Builder offers four different app types that you basically build through point-and-click. These include personalized search engines, chat, recommendation engine, and connected agents.
Search apps can tap into a variety of data sources including crawled websites, data warehouses, relational databases, and more.
What’s fairly new is the “agent app” so let’s try building one of those. Specifically, let’s say I run a baseball clinic (sigh, someday) and help people tune their swing in our batting cages. I might want a chat experience for those looking for help with swing mechanics, and then also offer the ability to book time in the batting cage. I need data, but also interactivity.
Before building the AI app, I need a Cloud Function that returns available times for the batting cage.
This Node.js function returns an array of book-able timeslots. I’ve hard-coded the data, but you get the idea.
I also jumped into the Google Cloud IAM interface to ensure that the Dialogflow service account (which the AI agent operates as) has permission to invoke the serverless function.
Let’s build the agent. Back in the Vertex AI Agent Builder interface, I choose “new app” and pick “agent.”
Now I’m dropped into the agent builder interface. On the left, I have navigation for agents, tools, test cases, and more. In the next column, I set the goal of the agent, the instructions, and any tools I want to use with the agent. On the right, I preview my agent.
I set a goal of “Answer questions about baseball and let people book time in the batting cage” and then get to the instructions. There’s a “sample” set of instructions that are useful for getting started. I used those, but removed references to other agents or tools, as we don’t have that yet.
But now I want to add a tool, as I need a way to show available booking times if the user asks. I have a choice of adding a data store—this is useful if you want to source Q&A from a BigQuery table, crawl a website, or get data from an API. I clicked the “manage all tools” button and chose to add a new tool. Here I give the tool a name, and very importantly, a description. This description is used by the AI agent to figure out when to invoke it.
Because I chose OpenAPI as the tool type, I need to provide an OpenAPI spec for my Cloud Function. There’s a sample provided, and I used that to put together my spec. Note that the URL is the function’s base URL, and the path contains the specific function name.
Finally, in this “tool setup” I define the authentication to that API. I chose “service agent token” and because I’m calling a specific instance of a service (versus the platform APIs), I picked “ID token.”
After saving the tool, I go back to the agent definition and want to update the instructions to invoke the tool. I use the syntax, and appreciated the auto-completion help.
Let’s see if it works. I went to the right-hand preview pane and asked it a generic baseball question. Good. Then I asked it for open times in the batting cage. Look at that! It didn’t just return a blob of JSON; it parsed the result and worded it well.
Very cool. There are some quirks with this tool, but it’s early, and I like where it’s going. This was MUCH simpler than me building a RAG-style or function-calling solution by hand.
Summary
The AI assistance and model building products get a lot of attention, but some of the most interesting work is happening in the tools for AI app builders. Whether you’re experimenting with prompts, coding up a solution, or assembling an app out of pre-built components, it’s a fun time to be developer. What products, tools, or frameworks did I miss from my assessment?
I had a great weekend, and am back in it today with meetings, some fiddling with cloud services, and reading. Enjoy the items below.
[article] Tech Works: How to Get Promoted without Becoming a Manager. In most respects, “manager” is a new role, not a promotion. Plenty of folks want to stay as individual contributors, and this piece looks at a new level for IC engineers.
[article] What do developers want from AI? This review of a paper by Google shows that devs want to stay in control, but use AI to do their job more efficiently, with less toil.
[blog] Leveraging Gemini 1.5 for Efficient Information Extraction on Long PDFs. Which generative AI use cases do I think add the most value today? I really like text summarization or extraction tasks. Imagine being able to parse big documents quickly and find that specific nugget of knowledge!
[blog] 7 Leadership Communication Skills for Managing a Remote Team. Even if you’re back in the office full-time now, I’d suspect that you work with folks outside your 10-foot radius. These are some good pieces of advice for managers helping a distributed team do its best work.
[blog] Measuring Service Failure, or why to not use CFR and MTTR. The argument here is to use Service Level Indicators and Service Level Objectives to measure services versus more broad metrics like change failure rate and mean time to recovery.
[blog] The trap of over-engineering and over-design. It’s alluring to build systems that can handle all sorts of possible future scenarios. It also causes us to build unnecessarily complicated architectures that rarely get flexed in all the ways we envisioned.
Today was a quick day, and I took a detour in the middle to accompany my son on his quest to earn his driver’s license. For better or worse, he passed. Buckle up out there.
[blog] OK Cloud, On-Prem is Alright. I don’t really believe “private cloud” is a real thing—nine times out of ten, it’s a nicely automated VM or container environment—but Ian offers up a thoughtful exploration of the hybrid future of most big companies.
[blog] isBooleanTooLongAndComplex. I don’t think I’ve ever seen this talked about, or thought about it much. Probably because I’m a subpar developer. It’s a short post, but give it a read.
It was delightful to see Alphabet have a strong quarter, and Google Cloud firmly establishing itself as a successful and growing platform. For today’s reading list, many of the items are about tech-agnostic best practices.
[blog] The CD Foundation Survey, 2024. I referenced this report last week, but Coté does a terrific deep dive into this confusing report. Specifically, why are automated build and testing not widely adopted?
[blog] How Does Angular Compare to Vue? Deep dive comparison here, with code samples and explanations of differences between these web frameworks.
[blog] On IBM acquiring HashiCorp. Great analysis from Fintan who may not officially be an industry analyst anymore, but his insights are always spot on.
[blog] How can teams resolve video meeting fatigue? Most of you are probably like me and spend a big portion of your week on video calls. This post looks at research around mitigating fatigue.
##
Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:
Busy day, but I created time to read a number of interesting items that you’ll find below. Enjoy!
[blog] Explaining Trunk Based Development. Here’s a good walkthrough of the definition, benefits, and challenges of this “single source of truth” source control strategy.
[blog] The Art of System Debugging — Decoding CPU Utilization. Troubleshooting modern software and infrastructure isn’t as simple as checking a monitoring dashboard. This post brings to life the journey to find a specific performance problem.
[blog] Pub/Sub to BQ connector. Real-time data processing isn’t easy. I like that we’ve made it simpler (i.e. fewer moving parts) to push data from our messaging service directly into the data warehouse.
It’s taken a couple of weeks, but I’m generally feeling “caught up” on work. There’s a lot going on—I’m sure for you as well—but I’ve seemingly got a handle on it. Stay tuned for tomorrow’s update when a crisis inevitably hits and I share that I have no idea what I’m doing.
[blog] The guide to Git I never had. This is a very good look at Git features and commands and will help you build a baseline understanding, or serve as a useful refresher.
[article] 7 innovative ways to use low-code tools and platforms. Will low-code systems find a second life or new use cases in the AI era? Maybe. This post shows a few use cases (in addition to AI) where low-code can add value.
[site] Awesome Gemini for Google Cloud. Here’s a monster readme with all sorts of educational links to videos, tutorials, blog posts, and how-tos for AI assistance in Google Cloud.
[article] Your Engineering Organization Is too Expensive. Can you “fix” your cost problems by incorporating platform engineering? It depends. You might also just be transferring your costs around, or creating an entirely new cost center if you accidentally built a bloated platform that requires 50 people to operate!
[blog] Go performance from version 1.0 to 1.22. Even if you don’t care about new features in the latest version of your preferred programming language, you should crave the security and performance updates that appear. Here’s how Go has gotten better with each release.
[article] Why we suck at estimating software projects. Anyone who gives assurances about a months-away software delivery estimate is confident liar. There are no “software factories” and software creation is still a creative endeavor with detours along the way.
[blog] It’s Time to Retire Terraform. Is it time for something different than Terraform’s infrastructure-as-code approach? These folks advocate for the Kubernetes Resource Model as a more modern means of deploying and managing infrastructure.
##
Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:
Most of today’s reading list seemed to center around “how to make good choices.” You’ll see a study set up to test AI assistance on developers, when AI is good (and not good) at generating tests for code, guidance for writing better tests overall, how to build a company, and more. Keep those “lessons learned” coming.
[article] The impact of AI tooling on engineering at ANZ Bank. Most folks I talk to about AI-assisted tooling want best practices and ROI/TCO data. Generic “productivity improvement” numbers are almost meaningless, and it’s important to run short studies in your own environment.
[article] 4 Big Developments in WebAssembly. I’m growing skeptical that Wasm will ever be widely used, but if it finds a useful niche, that’s still good.
Did you have a good week? I’m going to try and keep screen use to a minimum this weekend. I just started a new book (“Stolen Focus”) and I’m thinking more about how I can be present and tuned into what’s around me.
[blog] How to Create Better Products With Much Less of a Backlog. Big backlogs or bug lists are stress-inducing, and rarely worth maintaining. You’ll never get to it. This post is about keeping a tighter control on what you’re storing up to work on later.
[blog] The Serverless Illusion. There’s a lot to unpack here, but it’s a great exploration of what might make application development with managed services better.
[blog] 28 startups that launched at Next ’24. It’s super cool to see companies pair their announcements with our event. Many of these are building the next big thing.
Three hundred issues of this little daily post! I’m sure I’ll really find my voice by the time I reach issue eight hundred. A few of the items today really got me thinking and I hope you enjoy the list.
[article] AI Product Management. Building AI apps (not platforms) comes with risks to consider and plan for. This is a great article for teams about to embark on that journey.
[blog] Security, Maintainability, Velocity: Choose One. Tyler says that without careful consideration, you’ll find yourself choosing security, maintainability, or velocity for your software development.
[blog] Securing Prometheus with Istio Ambient. Quick post, but it highlights the “it just works” outcome of the new data plane in the Istio service mesh.