Category: AI/ML

What does a modern, AI-assisted developer workflow built around Google Gemini look like? Let’s explore.
Software is never going to be the same. Why would we go back to laborious research efforts, wasting time writing boilerplate code, and accepting so many interruptions to our flow state? Hard pass. It might not happen for you tomorrow, next month, or next year, but AI will absolutely improve your developer workflow.

Your AI-powered workflow may make use of more than one LLMs. Go for it. But we’ve done a good job of putting Gemini into nearly every stage of the new way of working. Let’s look at what you can do RIGHT NOW to build with Gemini.
Build knowledge, plans, and prototypes with Gemini

Are you still starting your learning efforts with a Google search? Amateurs 🙂 I mean, keep doing those so that we earn ad dollars. But you’ve got so many new ways to augment a basic search.

Gemini Deep Research is pretty amazing. Part of Gemini Advanced, it takes your query, searches the web on your behalf, and gives you a summary in minutes. Here I asked for help understanding the landscape of PostgreSQL providers, and it recapped results found in 240+ relevant websites from vendors, Reddit, analyst, and more.

Gemini Deep Research creating a report about the PostgreSQL landscape

You’ve probably heard of NotebookLM. Built with Gemini 2.0, it takes all sorts of digital content and helps you make sense of it. Including those hyper-realistic podcasts (“Audio Overviews”).

Planning your work or starting to flesh out a prototype? For free, Google AI Studio lets you interact with the latest Gemini models. Generate text, audio, or images from prompts. Produce complex codebases based on reference images or text prompts. Share your desktop and get live assistance on whatever task you’re doing. It’s pretty rad.

Google AI Studio’s Live API makes it possible to interact live with the model

Google Cloud customers can get knowledge from Gemini in a few ways. The chat for Gemini Cloud Assist gives me an ever-present agent that can help answer questions or help me explore options. Here, I asked for a summary of the options for running PostgreSQL in Google Cloud. It breaks the response down by fully-managed, self-managed, and options for migration.

Chat for Gemini Code Assist teaches me about PostgreSQL options

Gemini for Google Cloud blends AI-assistance into many different services. One way to use this is to understand existing SQL scripts, workflows, APIs, and more.

Gemini in BigQuery explains an existing query and helps me learn about it

Trying to plan out your next bit of work? Google AI Studio or Vertex AI Studio can assist here too. In either service, you can pass in your backlog of features and bugs, maybe an architecture diagram or two, and even some reference PDFs, and ask for help planning out the next sprint. Pretty good!

Vertex AI Studio “thinking” through a sprint plan based on multi-modal input

Build apps and agents with Gemini

We can use Google AI Studio or Vertex AI Studio to learn things and craft plans, but now let’s look at how you’d actually build apps with Gemini.

You can work with the raw Gemini API. There are SDK libraries for Python, Node, Go, Dart, Swift, and Android. If you’re working with Gemini 2.0 and beyond, there’s a new unified SDK that works with both the Developer API and Enterprise API (Vertex). It’s fairly easy to use. I wrote a Google Cloud Function that uses the unified Gemini API to generate dinner recipes for whatever ingredients you pass in.
```
package function

import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"net/http"
	"os"

	"github.com/GoogleCloudPlatform/functions-framework-go/functions"
	"google.golang.org/genai"
)

func init() {
	functions.HTTP("GenerateRecipe", generateRecipe)
}

func generateRecipe(w http.ResponseWriter, r *http.Request) {
	ctx := context.Background()
	ingredients := r.URL.Query().Get("ingredients")

	if ingredients == "" {
		http.Error(w, "Please provide ingredients in the query string, like this: ?ingredients=pork, cheese, tortilla", http.StatusBadRequest)
		return
	}

	projectID := os.Getenv("PROJECT_ID")
	if projectID == "" {
		projectID = "default" // Provide a default, but encourage configuration
	}

	location := os.Getenv("LOCATION")
	if location == "" {
		location = "us-central1" // Provide a default, but encourage configuration
	}

	client, err := genai.NewClient(ctx, &genai.ClientConfig{
		Project:  projectID,
		Location: location,
		Backend:  genai.BackendVertexAI,
	})
	//add error check for err
	if err != nil {
		log.Printf("error creating client: %v", err)
		http.Error(w, "Failed to create Gemini client", http.StatusInternalServerError)
		return
	}

	prompt := fmt.Sprintf("Given these ingredients: %s, generate a recipe.", ingredients)
	result, err := client.Models.GenerateContent(ctx, "gemini-2.0-flash-exp", genai.Text(prompt), nil)
	if err != nil {
		log.Printf("error generating content: %v", err)
		http.Error(w, "Failed to generate recipe", http.StatusServiceUnavailable)
		return
	}

	if len(result.Candidates) == 0 {
		http.Error(w, "No recipes found", http.StatusNotFound) // Or another appropriate status
		return
	}

	recipe := result.Candidates[0].Content.Parts[0].Text // Extract the generated recipe text

	response, err := json.Marshal(map[string]string{"recipe": recipe})
	if err != nil {
		log.Printf("error marshalling response: %v", err)
		http.Error(w, "Failed to format response", http.StatusInternalServerError)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	w.Write(response)
}
```
There are a lot agent frameworks out there right now. A LOT. Many of them have good Gemini support. You can build agents with Gemini using LangChain, LangChain4J, LlamaIndex, Spring AI, Firebase Genkit, and the Vercel AI SDK.

What coding tools can I use with Gemini? GitHub Copilot now supports Gemini models. Folks who love Cursor can choose Gemini as their underlying model. Same goes for fans of Sourcegraph Cody. Gemini Code Assist from Google Cloud puts AI-assisted tools into Visual Studio Code and the JetBrains IDEs. Get the power of Gemini’s long context, personalization on your own codebase, and now the use of tools to pull data from Atlassian, GitHub, and more. Use Gemini Code Assist within your local IDE, or in hosted environments like Cloud Workstations or Cloud Shell Editor.

Gemini Code Assist brings AI assistance to your dev workspace, including the use of tools

Project IDX is another Google-provided dev experience for building with Gemini. Use it for free, and build AI apps, with AI tools. It’s pretty great for frontend or backend apps.

Project IDX lets you build AI apps with AI tools

Maybe you’re building apps and agents with Gemini through low-code or declarative tools? There’s the Vertex AI Agent Builder. This Google Cloud services makes it fairly simple to create search agents, conversational agents, recommendation agents, and more. No coding needed!

Conversational agents in the Vertex AI Agent Builder

Another options for building with Gemini is the declarative Cloud Workflows service. I built a workflow that calls Gemini through Vertex AI and summarizes any provided document.
```
# Summarize a doc with Gemini
main:
  params: [args]
  steps:
  - init:
      assign:
        - doc_url: ${args.doc_url}
        - project_id: ${args.project_id}
        - location: ${args.location}
        - model: ${args.model_name}
        - desired_tone: ${args.desired_tone}
        - instructions: 
  - set_instructions:
      switch:
        - condition: ${desired_tone == ""}
          assign:
            - instructions: "Deliver a professional summary with simple language."
          next: call_gemini
        - condition: ${desired_tone == "terse"}
          assign:
            - instructions: "Deliver a short professional summary with the fewest words necessary."
          next: call_gemini
        - condition: ${desired_tone == "excited"}
          assign:
            - instructions: "Deliver a complete, enthusiastic summary of the document."
          next: call_gemini
  - call_gemini:
      call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent
      args:
        model: ${"projects/" + project_id + "/locations/" + location + "/publishers/google/models/" + model}
        region: ${location}
        body:
          contents:
            role: user
            parts:
              - text: "summarize this document"
              - fileData: 
                  fileUri: ${doc_url}
                  mimeType: "application/pdf"
          systemInstruction: 
            role: user
            parts:
              - text: ${instructions}
          generation_config:  # optional
            temperature: 0.2
            maxOutputTokens: 2000
            topK: 10
            topP: 0.9
      result: gemini_response
  - returnStep:
      return: ${gemini_response.candidates[0].content.parts[0].text}
```
Similarly, its sophisticated big-brother, Application Integration, can also interact with Gemini through drag-and-drop integration workflows. These sorts of workflow tools help you bake Gemini predictions into all sorts of existing processes.

Google Cloud Application Integration calls Gemini models

After you build apps and agents, you need a place to host them! In Google Cloud, you’ve could run in a virtual machine (GCE), Kubernetes cluster (GKE), or serverless runtime (Cloud Run). There’s also the powerful Firebase App Hosting for these AI apps.

There are also two other services to consider. For RAG apps, we now offer the Vertex AI RAG Engine. I like this because you get a fully managed experience for ingesting docs, storing in a vector database, and performing retrieval. Doing LangChain? LangChain on Vertex AI offers a handy managed environment for running agents and calling tools.

Build AI and data systems with Gemini

In addition to building straight-up agents or apps, you might build backend data or AI systems with Gemini.

If you’re doing streaming analytics or real-time ETL with Dataflow, you can build ML pipelines, generate embeddings, and even invoke Gemini endpoints for inference. Maybe you’re doing data analytics with frameworks like Apache Spark, Hadoop, or Apache Flink. Dataproc is a great service that you can use within Vertex AI, or to run all sorts of data workflows. I’m fairly sure you know what Colab is, as millions of folks per month use it for building notebooks. Colab and Colab Enterprise offer two great ways to build data solutions with Gemini.

Let’s talk about building with Gemini inside your database. From Google Cloud SQL, Cloud Spanner, and AlloyDB, you can create “remote models” that let you interact with Gemini from within your SQL queries. Very cool and useful. BigQuery also makes it possible to work directly with Gemini from my SQL query. Let me show you.

I made a dataset from the public “release notes” dataset from Google Cloud. Then I made a reference to the Gemini 2.0 Flash model, and then asked Gemini for a summary of all a product’s release notes from the past month.
```
-- create the remote model
CREATE OR REPLACE MODEL
`[project].public_dataset.gemini_2_flash`
REMOTE WITH CONNECTION `projects/[project]/locations/us/connections/gemini-connection`
OPTIONS (ENDPOINT = 'gemini-2.0-flash-exp');

-- query an aggregation of responses to get a monthly product summary
SELECT * 
FROM
 ML.GENERATE_TEXT(
    MODEL `[project].public_dataset.gemini_2_flash`,
    (
      SELECT CONCAT('Summarize this month of product announcements by rolling up the key info', monthly_summary) AS prompt
      FROM (
        SELECT STRING_AGG(description, '; ') AS monthly_summary
        FROM `bigquery-public-data`.`google_cloud_release_notes`.`release_notes` 
        WHERE product_name = 'AlloyDB' AND DATE(published_at) BETWEEN '2024-12-01' AND '2024-12-31'
      )
    ),
    STRUCT(
      .05 AS TEMPERATURE,
      TRUE AS flatten_json_output)
    )
```
How wild is that? Love it.

You can also build with Gemini in Looker. Build reports, visualizations, and use natural language to explore data. See here for more.

And of course, Vertex AI helps you build with Gemini. Build prompts, fine-tune models, manage experiments, make batch predictions, and lots more. If you’re working with AI models like Gemini, you should give Vertex AI a look.

Build a better day-2 experience with Gemini

It’s not just about building software with Gemini. The AI-driven product workflow extends to post-release activities.

Have to set up least-privilege permissions for service accounts? Build the right permission profile with Gemini.

The “Help me choose roles” feature uses Gemini to figure out the right permissions

Something goes wrong. You need to get back to good. You can build faster resolution plans with Gemini. Google Cloud Logging supports log summarization with Gemini.

Google Cloud Logging supports log summarization with Gemini

Ideally, you know when something goes wrong before your customers notice. Synthetic monitors are one way to solve that. We made it easy to build synthetic monitors with Gemini using natural language.

“Help me code” option for creating synthetic monitors in Cloud Monitoring

You don’t want to face security issues on day-2, but it happens. Gemini is part of Security Command Center where you can build search queries and summarize cases.

Gemini can also help you build billing reports. I like this experience where I can use natural language to get answers about my spend in Cloud Billing.

Gemini in Cloud Billing makes it easier to understand your spend

Build supporting digital assets with Gemini

The developer workflow isn’t just about code artifacts. Sometimes you create supporting assets for design docs, production runbooks, team presentations, and more.

Use the Gemini app (or our other AI surfaces) to generate images. I do this all the time now!

Image for use in a presentation is generated by Gemini

Building slides? Writing docs? Creating spreadsheets? Gemini for Workspace gives you some help here. I use this on occasion to refine text, generate slides or images, and update tables.

Gemini in Google Docs helps me write documents

Maybe you’re getting bored with static image representations and want some more videos in your life? Veo 2 is frankly remarkable and might be a new tool for your presentation toolbox. Consider a case where you’re building a mobile app that helps people share cars. Maybe produce a quick video to embed in the design pitch.

Veo 2 generating videos for use in a developer’s design pitch

AI disrupts the traditional product development workflow. Good! Gemini is part of each stage of the new workflow, and it’s only going to get better. Consider introducing one or many of these experiences to your own way of working in 2025.
January 13, 2025
8 ways AI will change how I work in 2025

You don’t have to use generative AI. It’s possible to avoid it and continue doing whatever you’ve been doing, the way you’ve been doing it. I don’t believe that sentence will be true in twelve months. Not because you’ll have to use it—although in some cases it may be unavoidable—but because you’ll want to use it. I thought about how my work will change next year.

#1. I’ll start most efforts by asking “can AI help with this?”

Do I need to understand a new market or product area? Analyze a pile of data? Schedule a complex series of meetings? Quickly generate a sample app for a customer demo? Review a blog post a teammate wrote? In most cases, AI can give me an assist. I want to change my mental model to first figure out if there’s a smarter (Ai-assisted) way to do something.

That said, it’s about “can AI help me” versus “can AI do all my work.” I don’t want to end up in this situation.

https://twitter.com/gregisenberg/status/1869202002783207622

#2. I’m going to do much better research.

Whether planning a strategy or a vacation, there’s a lot of time spent researching. That’s ok, as you often uncover intriguing new tangents while exploring the internet.

AI can still improve the process. A lot. I find myself using the Gemini app, Google AI Studio, and NotebookLM to understand complex ideas. Gemini Deep Research is almost unbelievable. Give it a prompt, it scours the web for dozens or hundreds of sources, and then compiles a report.

https://twitter.com/glenngabe/status/1867192814112543131

What an amazing way to start or validate research efforts. Have an existing pile of content—might be annual reports, whitepapers, design docs, or academic material—that you need to make sense of? NotebookLM is pretty amazing, and should change how all of us ask questions of research material.

https://twitter.com/itsolelehmann/status/1869099449357852831

#3. I will learn new things faster.

Many of us have jobs where we need to quickly get up to speed on a topic. I want help in context, so that I stay in a flowstate.

Back to NotebookLM, I might use this to get easier-to-digest audio overviews of complex new ideas.

https://twitter.com/omarsar0/status/1846958979227718023

And then with coding assistance tools, I also am getting more and more comfortable staying in my IDE to get help on things I don’t yet know. Here, my Gemini Code Assist extension is helping me learn how to fix my poorly-secured Java code.

Finally, I’m quite intrigued by how the new Gemini 2.0 Multimodal Live API will help me in the moment. By sharing my screen with the model, I can get realtime help into whatever I’m struggling with. Wow.

https://twitter.com/mckaywrigley/status/1866930933842186427

#4. I’ll less time debating and more time coding.

My day job is to lead a sizable team at Google Cloud and help everyone do their best work. I still like to code, though!

it’s already happening, but next year I expect to code more than in years past. Why? Because AI is making easier and more fun. Whether using an IDE assistant, or a completely different type of IDE like Cursor, it’s never been simpler to build legit software. We all can go from idea to reality so quickly now.

https://twitter.com/dnaijatechguy/status/1866797564747727252

Stop endlessly debating ideas, and just test them out quickly! Using lowcode platforms or AI assisted coding tools, you can get working prototypes in no time.

#5. I will ask better questions.

I’ve slowly learned that the best leaders simply ask better questions. AI can help us a few ways here. First, there are “thinking” models that show you a chain of thought that might inspire your own questions.

https://twitter.com/JeffDean/status/1869789813232341267

LLMs are awesome at giving answers, but they’re also pretty great at crafting questions. Look at this. I uploaded a set of (fake) product bugs and asked the Gemini model to help me come up with clarifying questions to ask the engineers. Good list!

And how about this. Google Cloud BigQuery has an excellent feature called Data Insights which generates a bunch of candidate questions for a given dataset (here, the Google Cloud Release Notes). What a great way to get some smart, starter questions to consider!

#6. I want to identify where the manual struggle is actually the point.

I don’t want AI to do everything for me. There are cases where the human struggle is where the enjoyment comes from. Learning how to do something. Fumbling with techniques. Building up knowledge or strength. I don’t want a shortcut. I want deep learning.

I’m going to keep doing my daily reading list by hand. No automation allowed, as it forces me to really get a deeper grasp on what’s going on in our industry. I’m not using AI to write newsletters, as I want to keep working on the writing craft myself.

This mass integration of AI into services and experiences is great. It also forces us to stop and decide where we intentionally want to avoid it!

#7. I should create certain types of content much faster.

There’s no excuse to labor over document templates or images in presentations anymore. No more scouring the web for the perfect picture.

I use Gemini in Google Slides all the time now. This is the way I add visuals to presentations and it saves me hours of time.

https://twitter.com/GoogleWorkspace/status/1821260772351037893

Generate code, docs, and images, sure. We’ve seen that, but the image generation tech is getting tremendous.

https://twitter.com/emollick/status/1868715099793338552

But videos too? I’m only starting to consider how to use remarkable technology like Veo 2. I’m using it now, and it’s blowing my mind. It’ll likely impact what I produce next year.

https://twitter.com/GoogleDeepMind/status/1868703624714395907

#8. I’m going to free up some valuable time.

That’s what most of this is all about. I don’t want to do less work; I want to do better work. Even with all this AI and automation, I expect I’ll be working the same number of hours next year. But I’ll be happier with how I’m spending those hours: learning, talking to humans, investing in others. Less time writing boilerplate code, breaking flow state to get answers, or even executing mindlessly repetitive tasks in the browser.

https://twitter.com/labsdotgoogle/status/1866874399724540016

I don’t work for AI; AI works for me. And in 2025, I’m expecting to make it work hard!

December 19, 2024
Customizing AI coding suggestions using the *best* code, not just *my* code
The ability to use your own codebase to customize the suggestions from an AI coding assist is a big deal. This feature—available in products like Gemini Code Assist, GitHub Copilot, and Tabnine—gives developers coding standards, data objects, error messages, and method signatures that they recognize from previous projects. Data shows that the acceptance rate for AI coding assistants goes way up when devs get back trusted results that look familiar. But I don’t just want up-to-date and familiar code that *I* wrote. How can I make sure my AI coding assistant gives me the freshest and best code possible? I used code customization in Gemini Code Assist to reference Google Cloud’s official code sample repos and now I get AI suggestions that feature the latest Cloud service updates and best practices for my preferred programming languages. Let me show you how I did it.

Last month, I showed how to use local codebase awareness in Gemini Code Assist (along with its 128,000 input token window) to “train” the model on the fly using code samples or docs that an LLM hasn’t been trained on yet. It’s a cool pattern, but also requires upfront understanding of what problem you want to solve, and work to stash examples into your code repo. Can I skip both steps?

Yes, Gemini Code Assist Enterprise is now available and I can point to existing code repos in GitHub or GitLab. When I reference a code repo, Google Cloud automatically crawls it, chunks it up, and stores it (encrypted) in a vector database within a dedicated project in my Google Cloud environment. Then, the Gemini Code Assist plugin uses that data as part of a RAG pattern when I ask for coding suggestions. By pointing at Google Cloud’s code sample repos—any best practice repo would apply here—I supercharge my recommendations with data the base LLM doesn’t have (or prioritize).

Step #0 – Prerequisites and considerations

Code customization is an “enterprise” feature of Gemini Code Assist, so it requires a subscription to that tier of service. There’s a promotional $19-per-month price until March of 2025, so tell your boss to get moving.

Also, this is currently available in US, European, and Asian regions, you may need to request geature access via a form (depending on when you read this), and today it works with GitHub.com and GitLab.com repos, although on-premises indexing is forthcoming. Good? Good. Let’s keep going.

Step #1 – Create the source repo

One wrinkle here is that you need to own the repos you ask Gemini Code Assist to index. You can’t just point at any random repo to index. Deal breaker? Nope.

I can just fork an existing repo into my own account! For example, here’s the Go samples repo from Google Cloud, and the Java one. Each one is stuffed with hundreds of coding examples for interacting with most of Google Cloud’s services. These repos are updated multiple times per week to ensure they include support for all the latest Cloud service features.

I went ahead and forked each repo in GitHub. You can do it via the CLI or in the web console.

I didn’t overthink it and kept the repository name the same.

Gemini Code Assist can index up to 950 repos (and more if really needed), so you could liberally refer to best-practice repos that will help your developers write better code.

Any time I want to refresh my fork to grab the latest code sample updates, I can do so.

Step #2 – Add a reference to the source repo

Now I needed to reference these repos for later code customization. Google Cloud Developer Connect is a service that maintains connections to source code sitting outside Google Cloud.

I started by choosing GitHub.com as my source code environment.

Then I named my Developer Connect connection.

Then I installed a GitHub app into my GitHub account. This app is what enables the loading of source data into the customization service. From here, I chose the specific repos that I wanted available to Developer Connect.

When finished, I had one of my own repos, and two best practice repos all added to Developer Connect.

That’s it! Now to point these linked repos to Gemini Code Assist.

Step #3 – Add a Gemini Code Assist customization index

I had just two CLI commands to execute.

First, I created a code customization index. You’ve got one index per Cloud project (although you can request more) and you create it with one command.

Next, I created a repository group for the index. You use these to control access to repos, and could have different ones for different dev audiences. Here’s where you actually point to a given repo that has the Developer Connect app installed.

I ran this command a few times to ensure that each of my three repos was added to the repository group (and index).

Indexing can take up to 24 hours, so here’s where you wait. After a day, I saw that all my target repos were successfully indexed.

Whenever I sync the fork with the latest updates to code samples, Gemini Code Assist will index the updated code automatically. And my IDE with Gemini Code Assist will have the freshest suggestions from our samples repo!

Step #4 – Use updated coding suggestions

Let’s prove that this worked.

I looked for a recent commit to the Go samples repos that the base Gemini Code Assist LLM wouldn’t know about yet. Here’s one that has new topic-creation parameters for our Managed Kafka service. I gave the prompt below to Gemini Code Assist. First, I used a project and account that was NOT tied to the code customization index.
```
//function to create a topic in Google Cloud Managed Kafka and include parameters for setting replicationfactor and partitioncount
```
The coding suggestion was good, but incomplete as it was missing the extra configs the service can now accept.

When I went to my Code Assist environment that did have code customization turned on, you see that the same prompt gave me a result that mirrored the latest Go sample code.

I tried a handful of Java and Go prompts, and I regularly (admittedly, not always) got back exactly what I wanted. Good prompt engineering might have helped me reach 100%, but I still appreciated the big increase in quality results. It was amazing to have hundreds of up-to-date Google-tested code samples to enrich my AI-provided suggestions!

AI coding assistants that offer code customization from your own repos are a difference maker. But don’t stop at your own code. Index other great code repos that represent the coding standards and fresh content your developers need!
November 18, 2024
I love this technique for getting up-to-date suggestions from my AI coding assistant
Trust. Without trust, AI coding assistants won’t become a default tool in a developer’s toolbox. Trust is the #1 concern of devs today, and it’s something I’ve struggled with in regards to getting the most relevant answers from an LLM. Specifically, am I getting back the latest information? Probably not, given that LLMs have a training cutoff date. Your AI coding assistant probably doesn’t (yet) know about Python 3.13, the most recent features of your favorite cloud service, or the newest architectural idea shared at a conference last week. What can you do about that?

To me, this challenge comes up in at least three circumstances. There are entirely new concepts or tools that the LLM training wouldn’t know about. Think something like pipe syntax as an alternative to SQL syntax. I wouldn’t expect a model trained last year to know about that. How about updated features to existing libraries or frameworks? I want suggestions that reflect the full feature set of the current technology and I don’t want to accidentally do something the hard (old) way. An example? Consider the new “enum type” structured output I can get from LangChain4J. I’d want to use that now! And finally, I think about improved or replicated framework libraries. If I’m upgrading from Java 8 to Java 23, or Deno 1 to Deno 2, I want to ensure I’m not using deprecated features. My AI tools probably don’t know about any of these.

I see four options for trusting the freshness of responses from your AI assistant. The final technique was brand new to me, and I think it’s excellent.
1. Fine-tune your model
2. Use retrieval augmented generation (RAG)
3. Ground the results with trusted sources
4. “Train” on the fly with input context
Let’s briefly look at the first three, and see some detailed examples of the fourth.

Fine-tune your model

Whether using commercial or open models, they all represent a point-in-time based on their training period. You could choose to repeatedly train your preferred model with fresh info about the programming languages, frameworks, services, and patterns you care about.

The upside? You can get a model with knowledge about whatever you need to trust it. The downside? It’s a lot of work—you’d need to craft a healthy number of examples and must regularly tune the model. That could be expensive, and the result wouldn’t naturally plug into most AI coding assistance tools. You’d have to jump out of your preferred coding tool to ask questions of a model elsewhere.

Use RAG

Instead of tuning a serving a custom model, you could choose to augment the input with pre-processed content. You’ll get back better, more contextual results when taking into account data that reflects the ideal state.

The upside? You’ll find this pattern increasingly supported in commercial AI assistants. This keeps you in your flow without having to jump out to another interface. GitHub Copilot offers this, and now our Gemini Code Assist provides code customization based on repos in GitHub or GitLab. With Code Assist, we handle the creation and management of the code index of your repos, and you don’t have to manually chunk and store your code. The downside? This only works well if you’ve got the most up-to-date data in an indexed source repo. If you’ve got old code or patterns in there, that won’t help your freshness problem. And while these solutions are good for extra code context, they may not support a wider range of possible context sources (e.g. text files).

Ground the results

This approach gives you more confidence that the results are accurate. For example, Google Cloud’s Vertex AI offers “ground with Google Search” so that responses are matched to real, live Google Search results.

If I ask a question about upgrading an old bit of Deno code, you can see that the results are now annotated with reference points. This gives me confidence to some extent, but doesn’t necessarily guarantee that I’m getting the freshest answers. Also, this is outside of my preferred tool, so it again takes me out of a flow state.

Train on the fly

Here’s the approach I just learned about from my boss’s boss, Keith Ballinger. I complained about freshness of results from AI assistance tools, and he said “why don’t you just train it on the fly?” Specifically, pass the latest and greatest reference data into a request within the AI assistance tool. Mind … blown.

How might it handle entirely new concepts or tools? Let’s use that pipe syntax example. In my code, I want to use this fresh syntax instead of classic SQL. But there’s no way my Gemini Code Assist environment knows about that (yet). Sure enough, I just get back a regular SQL statement.

But now, Gemini Code Assist supports local codebase awareness, up to 128,000 input tokens! I grabbed the docs for pipe query syntax, saved as a PDF, and then asked Google AI Studio to produce a Markdown file of the docs. Note that Gemini Code Assist isn’t (yet) multi-modal, so I need Markdown instead of passing in a PDF or image. I then put a copy of that Markdown file in a “training” folder within my app project. I used the new @ mention feature in our Gemini Code Assist chat to specifically reference the syntax file when asking my question again.

Wow! So by giving Gemini Code Assist a reference file of pipe syntax, it was able to give me an accurate, contextual, and fresh answer.

What about updated features to existing libraries or frameworks? I mentioned the new feature of LangChain4J for the Gemini model. There’s no way I’d expect my coding assistant to know about a feature added a few days ago. Once again, I grabbed some resources. This time, I snagged the Markdown doc for Google Vertex AI Gemini from the LangChain4J repo, and converted a blog post from Guillaume to Markdown using Google AI Studio.

My prompt to the Gemini Code Assist model was “Update the service function with a call to Gemini 1.5 Flash using LangChain4J. It takes in a question about a sport, and the response is mapped to an enum with values for baseball, football, cricket, or other.” As expected, the first response was a good attempt, but it wasn’t fully accurate. And it used a manual way to map the response to an enum.

What if I pass in both of those training files with my prompt? I get back exactly the syntax I wanted for my Cloud Run Function!

So great. This approach requires me to know what tech I’m interested in up front, but still, what an improvement!

Final example. How about improved or replicated framework libraries? Let’s say I’ve got a very old Deno app that I created when I first got excited about this excellent JavaScript runtime.
```
// from https://deno.com/blog/v1.35#denoserve-is-now-stable
async function handleHttp(conn: Deno.Conn) {
  // `await` is needed here to wait for the server to handle the request
  await (async () => {
    for await (const r of Deno.serveHttp(conn)) {
      r.respondWith(new Response("Hello World from Richard"));
    }
  })();
}

for await (const conn of Deno.listen({ port: 8000 })) {
  handleHttp(conn);
}
```
This code uses some libraries and practices that are now out of date. When I modernize this app, I want to trust that I’m doing it the best way. Nothing to fear! I grabbed the Deno 1.x to 2.x migration guide, a blog post about the new approach to web servers, and the launch blog for Deno 2. The result? Impressive, including a good description of why it generated the code this way.

I could imagine putting the latest reference apps into a repo and using Gemini Code Assist’s code customization feature to pull that automatically into my app. But this demonstrated technique gives me more trust in the output of tool when freshness is paramount. What do you think?
October 14, 2024

3 ways to use AI to grade homework assignments

School is back in session, and I just met with a handful of teachers at a recent back-to-school night. They’re all figuring out how to account for generative AI tools that students have access to. I say, let’s give teachers the same tools to use. Specifically, what if a teacher wants a quick preliminary grade on book reports submitted by their students? To solve this, I used Gemini Flash 1.5 in Google Cloud Vertex AI in three different ways—one-off in the prompt editor, through code, and via declarative workflow.

Grade Homework in Vertex AI Studio

Maybe you just have one or two papers to grade. Something like Vertex AI Studio is a good choice. Even if you’re not a Google Cloud customer, you can use it for free through this link.

You don’t have any book reports handy to test this with? Me neither. In Vertex AI Studio, I prompted with something like “Write a 300 word book report for Pride and Prejudice from the perspective of an 8th grade student with a good vocabulary and strong writing skills.”

With that handy, let’s “grade” the paper. Here I’m using system instructions to direct the model. I used Gemini to generate the initial version and then I tweaked it a bit. Here’s the instructions:

Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student.

This tells the model what to look for, the expected quality (“honors English”), and the desired output.

I added the system instructions to the marked section within Vertex AI Studio.

Then I simply pasted the book report into the “prompt” section and submitted the prompt.

I thought this was a pretty good book report, but Gemini 1.5 Flash had a high bar after obeying my system instructions.

This book report shows good understanding of the book and a strong grasp of the themes. However, there are some issues:

Structure:
The introduction is a little long and could be more concise.
The conclusion is also repetitive.
The report could benefit from more clear paragraph breaks to organize the information more effectively.

Grammar:
No significant grammar issues.

Content:
The student demonstrates good understanding of the novel's major themes and characters.
There's a strong focus on plot summary, which is okay, but the report could benefit from more critical analysis. The student needs to go beyond just describing the events to analyze how they impact the themes, characters, or messages of the book.
Overall: This report shows good effort, but it needs to be refined.

Grade: B-

If I’m a teacher, this is a fascinating way to get a preliminary assessment of a writing assignment!

Grade Homework Through Code

The above solution works fine for one-off experiences, but how might you scale this AI-assisted grader? Another option is code.

To try this scenario out, I used Cloud Firestore as my document database holding the book reports. I created a collection named “Papers” in the default database and added three documents. Each one holds a different book report.

I think used the Firestore API and Vertex AI API to write some simple Go code that iterates through each Firestore document, calls Vertex AI using the provided system instructions, and then logs out the grade for each report. Note that I could have used a meta framework like LangChain, LlamaIndex, or Firebase Genkit, but I didn’t see the need.

package main

import (
	"context"
	"fmt"
	"log"
	"os"

	"cloud.google.com/go/firestore"
	"cloud.google.com/go/vertexai/genai"
	"google.golang.org/api/iterator"
)

func main() {
	// get configuration from environment variables
	projectID := os.Getenv("PROJECT_ID") 
	collectionName := os.Getenv("COLLECTION_NAME") // "Papers"
	location := os.Getenv("LOCATION")              //"us-central1"
	modelName := os.Getenv("MODEL_NAME")           // "gemini-1.5-flash-001"

	ctx := context.Background()

	//initialize Vertex AI client
	vclient, err := genai.NewClient(ctx, projectID, location)
	if err != nil {
		log.Fatalf("error creating vertex client: %v\n", err)
	}
	gemini := vclient.GenerativeModel(modelName)

	//add system instructions
	gemini.SystemInstruction = &genai.Content{
		Parts: []genai.Part{genai.Text(`Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student.
		`)},
	}

	// Initialize Firestore client
	client, err := firestore.NewClient(ctx, projectID)
	if err != nil {
		log.Fatalf("Failed to create client: %v", err)
	}
	defer client.Close()

	// Get documents from the collection
	iter := client.Collection(collectionName).Documents(ctx)
	for {
		doc, err := iter.Next()
		if err != nil {
			if err == iterator.Done {
				break
			}
			log.Fatalf("error iterating through documents: %v\n", err)
		}

		//create the prompt
		prompt := genai.Text(doc.Data()["Contents"].(string))

		//call the model and get back the result
		resp, err := gemini.GenerateContent(ctx, prompt)
		if err != nil {
			log.Fatalf("error generating context: %v\n", err)
		}

		//print out the top candidate part in the response
		log.Println(resp.Candidates[0].Content.Parts[0])
	}

	fmt.Println("Successfully iterated through documents!")
}

The code isn’t great, but the results were. I’m also getting more verbose responses from the model, which is cool. This is a much more scalable way to quickly grade all the homework.

Grade Homework in Cloud Workflows

I like the code solution, but maybe I want to run this preliminary grading on a scheduled basis? Every Tuesday night? I could do that with my above code, but how about using a no-code workflow engine? Our Google Cloud Workflows product recently got a Vertex AI connector. Can I make it work with the same system instructions as the above two examples? Yes, yes I can.

I might be the first person to stitch all this together, but it works great. I first retrieved the documents from Firestore, looped through them, and called Vertex AI with the provided system instructions. Here’s the workflow’s YAML definition:

main:
  params: [args]
  steps:
  - init:
      assign:
        - collection: ${args.collection_name}
        - project_id: ${args.project_id}
        - location: ${args.location}
        - model: ${args.model_name}
  - list_documents:
        call: googleapis.firestore.v1.projects.databases.documents.list
        args:
            collectionId: ${collection}
            parent: ${"projects/" + project_id + "/databases/(default)/documents"}
        result: documents_list
  - process_documents:
        for:
          value: document 
          in: ${documents_list.documents}
          steps:
            - ask_llm:
                call: googleapis.aiplatform.v1.projects.locations.endpoints.generateContent
                args: 
                    model: ${"projects/" + project_id + "/locations/" + location + "/publishers/google/models/" + model}
                    region: ${location}
                    body:
                        contents:
                            role: "USER"
                            parts:
                                text: ${document.fields.Contents.stringValue}
                        systemInstruction: 
                            role: "USER"
                            parts:
                                text: "Task: Evaluate 8th-grade book reports for an honors English class. You are a tough grader. Input: Book report text. Output: Initial letter grade (A, B, C, D, or F) based on: Structure: Clear introduction, body, and conclusion Grammar: Spelling, punctuation, sentence structure. Content: Understanding of the book, critical thinking. Consider: Age and language proficiency of the student."
                        generation_config:
                            temperature: 0.5
                            max_output_tokens: 2048
                            top_p: 0.8
                            top_k: 40
                result: llm_response
            - log_file_name:
                call: sys.log
                args:
                    text: ${llm_response}

No code! I executed the workflow, passing in all the runtime arguments.

In just a moment, I saw my workflow running, and “grades” being logged to the console. In real life, I’d probably update the Firestore document with this information. I’d also use Cloud Scheduler to run this on a regular basis.

While I made this post about rescuing educators from the toil of grading papers, you can apply these patterns to all sorts of scenarios. Use prompt editors like Vertex AI Studio for experimentation and finding the right prompt phrasing. Then jump into code to interact with models in a repeatable, programmatic way. And consider low-code tools when model interactions are scheduled, or part of long running processes.

September 12, 2024

Store prompts in source control and use AI to generate the app code in the build pipeline? Sounds weird. Let’s try it!
I can’t remember who mentioned this idea to me. It might have been a customer, colleague, internet rando, or voice in my head. But the idea was whether you could use source control for the prompts, and leverage an LLM to dynamically generate all the app code each time you run a build. That seems bonkers for all sorts of reasons, but I wanted to see if it was technically feasible.

Should you do this for real apps? No, definitely not yet. The non-deterministic nature of LLMs means you’d likely experience hard-to-find bugs, unexpected changes on each build, and get yelled at by regulators when you couldn’t prove reproducibility in your codebase. When would you use something like this? I’m personally going to use this to generate stub apps to test an API or database, build demo apps for workshops or customer demos, or to create a component for a broader architecture I’m trying out.

tl;dr I built an AI-based generator that takes a JSON file of prompts like this and creates all the code. I call this generator from a CI pipeline which means that I can check in (only) the prompts to GitHub, and end up with a running app in the cloud.
```
{
  "folder": "generated-web",
  "prompts": [
    {
      "fileName": "employee.json",
      "prompt": "Generate a JSON structure for an object with fields for id, full name, state date, and office location. Populate it with sample data. Only return the JSON content and nothing else."
    },
    {
      "fileName": "index.js",
      "prompt": "Create a node.js program. It instantiates an employee object that looks like the employee.json structure. Start up a web server on port 8080 and expose a route at /employee return the employee object defined earlier."
    },
    {
      "fileName": "package.json",
      "prompt": "Create a valid package.json for this node.js application. Do not include any comments in the JSON."
    },
    {
      "fileName": "Dockerfile",
      "prompt": "Create a Dockerfile for this node.js application that uses a minimal base image and exposes the app on port 8080."
    }
  ]
}
```
In this post, I’ll walk through the steps of what a software delivery workflow such as this might look like, and how I set up each stage. To be sure, you’d probably make different design choices, write better code, and pick different technologies. That’s cool; this was mostly an excuse for me to build something fun.

Before explaining this workflow, let me first show you the generator itself and how it works.

Building an AI code generator

There are many ways to build this. An AI framework makes it easier, and I chose Spring AI because I wanted to learn how to use it. Even though this is a Java app, it generates code in any programming language.

I began at Josh Long’s second favorite place on the Internet, start.spring.io. Here I started my app using Java 21, Maven, and the Vertex AI Gemini starter, which pulls in Spring AI.

My application properties point at my Google Cloud project and I chose to use the impressive new Gemini 1.5 Flash model for my LLM.
```
spring.application.name=demo
spring.ai.vertex.ai.gemini.projectId=seroter-project-base
spring.ai.vertex.ai.gemini.location=us-central1
spring.ai.vertex.ai.gemini.chat.options.model=gemini-1.5-flash-001
```
My main class implements the CommandLineRunner interface and expects a single parameter, which is a pointer to a JSON file containing the prompts. I also have a couple of classes that define the structure of the prompt data. But the main generator class is where I want to spend some time.

Basically, for each prompt provided to the app, I look for any local files to provide as multimodal context into the request (so that the LLM can factor in any existing code as context when it processes the prompt), call the LLM, extract the resulting code from the Markdown wrapper, and write the file to disk.

Here are those steps in code. First I look for local files:
```
//load code from any existing files in the folder
private Optional<List<Media>> getLocalCode() {
    String directoryPath = appFolder;
    File directory = new File(directoryPath);

    if (!directory.exists()) {
        System.out.println("Directory does not exist: " + directoryPath);
        return Optional.empty();
    }

    try {
        return Optional.of(Arrays.stream(directory.listFiles())
            .filter(File::isFile)
            .map(file -> {
                try {
                    byte[] codeContent = Files.readAllLines(file.toPath())
                        .stream()
                        .collect(Collectors.joining("\n"))
                        .getBytes();
                    return new Media(MimeTypeUtils.TEXT_PLAIN, codeContent);
                } catch (IOException e) {
                    System.out.println("Error reading file: " + file.getName());
                    return null;
                }
            })
            .filter(Objects::nonNull)
            .collect(Collectors.toList()));
    } catch (Exception e) {
        System.out.println("Error getting local code");
        return Optional.empty();
    }
}
```
I call the LLM using Spring AI, choosing one of two method depending on whether there’s any local code or not. There won’t be any code when the first prompt is executed!
```
//call the LLM and pass in existing code
private String callLlmWithLocalCode(String prompt, List<Media> localCode) {
    System.out.println("calling LLM with local code");
    var userMessage = new UserMessage(prompt, localCode);
    var response = chatClient.call(new Prompt(List.of(userMessage)));
    return extractCodeContent(response.toString());
}

//call the LLM when there's no local code
private String callLlmWithoutLocalCode(String prompt) {
    System.out.println("calling LLM withOUT local code");
    var response = chatClient.call(prompt);
    return extractCodeContent(response.toString());
}
```
You see there that I’m extracting the code itself from the response string with this operation:
```
//method that extracts code from the LLM response
public static String extractCodeContent(String markdown) {

    System.out.println("Markdown: " + markdown);

    String regex = "`(\\w+)?\\n([\\s\\S]*?)```";  
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(markdown);

    if (matcher.find()) {
        String codeContent = matcher.group(2); // Extract group 2 (code content)
        return codeContent;
    } else {
        //System.out.println("No code fence found.");
        return markdown;
    }
}
```
And finally, I write the resulting code to disk:
```
//write the final code to the target file path
private void writeCodeToFile(String filePath, String codeContent) {
    try {
        File file = new File(filePath);
        if (!file.exists()) {
            file.createNewFile();
        }

        FileWriter writer = new FileWriter(file);
        writer.write(codeContent);
        writer.close();

        System.out.println("Content written to file: " + filePath);
    } catch (IOException e) {
        e.printStackTrace();
    }
}
```
There’s some more ancillary stuff that you can check out in the complete GitHub repo with this app in it. I was happy to be using Gemini Code Assist while building this. This AI assistant helped me understand some Java concepts, complete some functions, and fix some of my subpar coding choices.

That’s it. Once I had this component, I built a JAR file and could now use it locally or in a continuous integration pipeline to produce my code. I uploaded the JAR file to Google Cloud Storage so that I could use it later in my CI pipelines. Now, onto the day-to-day workflow that would use this generator!

Workflow step: Set up repo and pipeline

Like with most software projects, I’d start with the supporting machinery. In this case, I needed a source repo to hold the prompt JSON files. Done.

And I’d also consider setting up the path to production (or test environment, or whatever) to build the app as it takes shape. I’m using Google Cloud Build for a fully-managed CI service. It’s a good service with a free tier. Cloud Build uses declarative manifests for pipelines, and this pipeline starts off the same for any type of app.
```
steps:
  # Print the contents of the current directory
  - name: 'bash'
    id: 'Show source files'
    script: |
      #!/usr/bin/env bash
      ls -l

  # Copy the JAR file from Cloud Storage
  - name: 'gcr.io/cloud-builders/gsutil'
    id: 'Copy AI generator from Cloud Storage'
    args: ['cp', 'gs://seroter-llm-demo-tools/demo-0.0.1-SNAPSHOT.jar', 'demo-0.0.1-SNAPSHOT.jar']

  # Print the contents of the current directory
  - name: 'bash'
    id: 'Show source files and builder tool'
    script: |
      #!/usr/bin/env bash
      ls -l
```
Not much to it so far. I just print out the source contents seen in the pipeline, download the AI code generator from the above-mentioned Cloud Storage bucket, and prove that it’s on the scratch disk in Cloud Build.

Ok, my dev environment was ready.

Workflow step: Write prompts

In this workflow, I don’t write code, I write prompts that generate code. I might use something like Google AI Studio or even Vertex AI to experiment with prompts and iterate until I like the response I get.

Within AI Studio, I chose Gemini 1.5 Flash because I like nice things. Here, I’d work through the various prompts I would need to generate a working app. This means I still need to understand programming languages, frameworks, Dockerfiles, etc. But I’m asking the LLM to do all the coding.

Once I’m happy with all my prompts, I add them to the JSON file. Note that each prompt entry has a corresponding file name that I want the generator to use when writing to disk.

At this point, I was done “coding” the Node.js app. You could imagine having a dozen or so templates of common app types and just grabbing one and customizing it quickly for what you need!

Workflow step: Test locally

To test this, I put the generator in a local folder with a prompt JSON file and ran this command from the shell:
```
rseroter$ java -jar  demo-0.0.1-SNAPSHOT.jar --prompt-file=app-prompts-web.json
```
After just a few seconds, I had four files on disk.

This is just a regular Node.js app. After npm install and npm start commands, I ran the app and successfully pinged the exposed API endpoint.

Can we do things more sophisticated? I haven’t tried a ton of scenarios, but I wanted to see if I could get a database interaction generated successfully.

I went into the Google Cloud console and spun up a (free tier) instance of Cloud Firestore, our NoSQL database. I then created a “collection” called “Employees” and added a single document to start it off.

Then I built a new prompts file with directions to retrieve records from Firestore. I messed around with variations that encouraged the use of certain libraries and versions. Here’s a version that worked for me.
```
{
  "folder": "generated-web-firestore",
  "prompts": [
    {
      "fileName": "employee.json",
      "prompt": "Generate a JSON structure for an object with fields for id, full name, state date, and office location. Populate it with sample data. Only return the JSON content and nothing else."
    },
    {
      "fileName": "index.js",
      "prompt": "Create a node.js program. Start up a web server on port 8080 and expose a route at /employee. Initializes a firestore database using objects from the @google-cloud/firestore package, referencing Google Cloud project 'seroter-project-base' and leveraging Application Default credentials. Return all the documents from the Employees collection."
    },
    {
      "fileName": "package.json",
      "prompt": "Create a valid package.json for this node.js application using version 7.7.0 for @google-cloud/firestore dependency. Do not include any comments in the JSON."
    },
    {
      "fileName": "Dockerfile",
      "prompt": "Create a Dockerfile for this node.js application that uses a minimal base image and exposes the app on port 8080."
    }
  ]
}
```
After running the prompts through the generator app again, I got four new files, this time with code to interact with Firestore!

Another npm install and npm start command set started the app and served up the document sitting in Firestore. Very nice.

Finally, how about a Python app? I want a background job that actually populates the Firestore database with some initial records. I experimented with some prompts, and these gave me a Python app that I could use with Cloud Run Jobs.
```
{
  "folder": "generated-job-firestore",
  "prompts": [
    {
      "fileName": "main.py",
      "prompt": "Create a Python app with a main function that initializes a firestore database object with project seroter-project-base and Application Default credentials. Add two documents to the Employees collection. Generate random id, fullname, startdate, and location data for each document. Have the start script try to call that main function and if there's an exception, prints the error."
    },
    {
      "fileName": "requirements.txt",
      "prompt": "Create a requirements.txt file for the packages used by this app"
    },
    {
      "fileName": "Procfile",
      "prompt": "Create a Procfile for python3 that starts up main.py"
    },
    {
      "fileName": "Dockerfile",
      "prompt": "Create a Dockerfile for this Python batch application that uses a minimal base image and doesn't expose any ports"
    }
  ]
}
```
Running this prompt set through the AI generator gave me the valid files I wanted. All my prompt files are here.

At this stage, I was happy with the local tests and ready to automate the path from source control to cloud runtime.

Workflow step: Generate app in pipeline

Above, I had started the Cloud Build manifest with the step of yanking down the AI generator JAR file from Cloud Storage.

The next step is different for each app we’re building. I could use substitution variables in Cloud Build and have a single manifest for all of them, but for demonstration purposes, I wanted one manifest per prompt set.

I added this step to what I already had above. It executes the same command in Cloud Build that I had run locally to test the generator. First I do an apt-get on the “ubuntu” base image to get the Java command I need, and then invoke my JAR, passing in the name of the prompt file.
```
...

# Run the JAR file
  - name: 'ubuntu'
    id: 'Run AI generator to create code from prompts'
    script: |
      #!/usr/bin/env bash
      apt-get update && apt-get install -y openjdk-21-jdk
      java -jar  demo-0.0.1-SNAPSHOT.jar --prompt-file=app-prompts-web.json

  # Print the contents of the generated directory
  - name: 'bash'
    id: 'Show generated files'
    script: |
      #!/usr/bin/env bash
      ls ./generated-web -l
```
I updated my Cloud Build pipeline that’s connected to my GitHub repo with an updated YAML manifest.

Running the pipeline at this point showed that the generator worked correctly and adds the expected files to the scratch volume in the pipeline. Awesome.

At this point, I had an app generated from prompts found in GitHub.

Workflow step: Upload artifact

Next up? Getting this code into a deployable artifact. There are plenty of options, but I want to use a container-based runtime, and need a container image. Cloud Build makes that easy.

I added another section to my existing Cloud Build manifest to containerize with Docker and upload to Artifact Registry.
```
 # Containerize the code and upload to Artifact Registry
  - name: 'gcr.io/cloud-builders/docker'
    id: 'Containerize generated code'
    args: ['build', '-t', 'us-west1-docker.pkg.dev/seroter-project-base/ai-generated-images/generated-web:latest', './generated-web']
  - name: 'gcr.io/cloud-builders/docker'
    id: 'Push container to Artifact Registry'
    args: ['push', 'us-west1-docker.pkg.dev/seroter-project-base/ai-generated-images/generated-web']
```
It used the Dockerfile our AI generator created, and after this step ran, I saw a new container image.

Workflow step: Deploy and run app

The final step, running the workload! I could use our continuous deployment service Cloud Deploy but I took a shortcut and deployed directly from Cloud Build. This step in the Cloud Build manifest does the job.
```
  # Deploy container image to Cloud Run
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    id: 'Deploy container to Cloud Run'
    entrypoint: gcloud
    args: ['run', 'deploy', 'generated-web', '--image', 'us-west1-docker.pkg.dev/seroter-project-base/ai-generated-images/generated-web', '--region', 'us-west1', '--allow-unauthenticated']
```
After saving this update to Cloud Build and running it again, I saw all the steps complete successfully.

Most importantly, I had an active service in Cloud Run that served up a default record from the API endpoint.

I went ahead and ran a Cloud Build pipeline for the “Firestore” version of the web app, and then the background job that deploys to Cloud Run Jobs. I ended up with two Cloud Run services (web apps), and one Cloud Run Job.

I executed the job, and saw two new Firestore records in the collection!

To prove that, I executed the Firestore version of the web app. Sure enough, the records returned include the two new records.

Wrap up

What we saw here was a fairly straightforward way to generate complete applications from nothing more than a series of prompts fed to the Gemini model. Nothing prevents you from using a different LLM, or using other source control, continuous integration, and hosting services. Just do some find-and-replace!

Again, I would NOT use this for “real” workloads, but this sort of pattern could be a powerful way to quickly create supporting apps and components for testing or learning purposes.

You can find the whole project here on GitHub.

What do you think? Completely terrible idea? Possibly useful?
May 31, 2024
Here’s what I’d use to build a generative AI application in 2024
What exactly is a “generative AI app”? Do you think of chatbots, image creation tools, or music makers? What about document analysis services, text summarization capabilities, or widgets that “fix” your writing? These all seem to apply in one way or another! I see a lot written about tools and techniques for training, fine-tuning, and serving models, but what about us app builders? How do we actually build generative AI apps without obsessing over the models? Here’s what I’d consider using in 2024. And note that there’s much more to cover besides just building—think designing, testing, deploying, operating—but I’m just focusing on the builder tools today.

Find a sandbox for experimenting with prompts

A successful generative AI app depends on a useful model, good data, and quality prompts. Before going to deep on the app itself, it’s good to have a sandbox to play in.

You can definitely start with chat tools like Gemini and ChatGPT. That’s not a bad way to get your hands dirty. There’s also a set of developer-centric surfaces such as Google Colab or Google AI Studio. Once you sign in with a Google ID, you get free access to environments to experiment.

Let’s look at Google AI Studio. Once you’re in this UI, you have the ability to simulate a back-and-forth chat, create freeform prompts that include uploaded media, or even structured prompts for more complex interactions.

If you find yourself staring at an empty console wondering what to try, check out this prompt gallery that shows off a lot of unique scenarios.

Once you’re doing more “serious” work, you might upgrade to a proper cloud service that offers a sandbox along with SLAs and prompt lifecycle capabilities. Google Cloud Vertex AI is one example. Here, I created a named prompt.

With my language prompts, I can also jump into a nice “compare” experience where I can try out variations of my prompt and see if the results are graded as better or worse. I can even set one as “ground truth” used as a baseline for all comparisons.

Whatever sandbox tools you use, make sure they help you iterate quickly, while also matching the enterprise-y needs of the use case or company you work for.

Consume native APIs when working with specific models or platforms

At this point, you might be ready to start building your generative AI app. There seems to be a new, interesting foundation model up on Hugging Face every couple of days. You might have a lot of affection for a specific model family, or not. If you care about the model, you might choose the APIs for that specific model or provider.

For example, let’s say you were making good choices and anchored your app to the Gemini model. I’d go straight to the Vertex AI SDK for Python, Node, Java, or Go. I might even jump to the raw REST API and build my app with that.

If I were baking a chat-like API call into my Node.js app, the quickest way to get the code I need is to go into Vertex AI, create a sample prompt, and click the “get code” button.

I took that code, ran it in a Cloud Shell instance, and it worked perfectly. I could easily tweak it for my specific needs from here. Drop this code into a serverless function, Kubernetes pod, or VM and you’ve got a working generative AI app.

You could follow this same direct API approach when building out more sophisticated retrieval augmented generation (RAG) apps. In a Google Cloud world, you might use the Vertex AI APIs to get text embeddings. Or you could choose something more general purpose and interact with a PostgreSQL database to generate, store, and query embeddings. This is an excellent example of this approach.

If you have a specific model preference, you might choose to use the API for Gemini, Llama, Mistral, or whatever. And you might choose to directly interact with database or function APIs to augment the input to those models. That’s cool, and is the right choice for many scenarios.

Use meta-frameworks for consistent experiences across models and providers

As expected, the AI builder space is now full of higher-order frameworks that help developers incorporate generative AI into their apps. These frameworks help you call LLMs, work with embeddings and vector databases, and even support actions like function calling.

LangChain is a big one. You don’t need to be bothered with many model details, and you can chain together tasks to get results. It’s for Python devs, so your choice is either to use Python, or, embrace one of the many offshoots. There’s LangChain4J for Java devs, LangChain Go for Go devs, and LangChain.js for JavaScript devs.

You have other choices if LangChain-style frameworks aren’t your jam. There’s Spring AI, which has a fairly straightforward set of objects and methods for interacting with models. I tried it out for interacting with the Gemini model, and almost found it easier to use than our native API! It takes one update to my POM file:
```
<dependency>
			<groupId>org.springframework.ai</groupId>
			<artifactId>spring-ai-vertex-ai-gemini-spring-boot-starter</artifactId>
</dependency>
```
One set of application properties:
```
spring.application.name=demo
spring.ai.vertex.ai.gemini.projectId=seroter-dev
spring.ai.vertex.ai.gemini.location=us-central1
spring.ai.vertex.ai.gemini.chat.options.model=gemini-pro-vision
```
And then an autowired chat object that I call from anywhere, like in this REST endpoint.
```
@RestController
@SpringBootApplication
public class DemoApplication {

	public static void main(String[] args) {
		SpringApplication.run(DemoApplication.class, args);
	}

	private final VertexAiGeminiChatClient chatClient;

	@Autowired
    public DemoApplication(VertexAiGeminiChatClient chatClient) {
        this.chatClient = chatClient;
    }

	@GetMapping("/")
	public String getGeneratedText() {
		String generatedResponse = chatClient.call("Tell me a joke");
		return generatedResponse;
	}
}
```
Super easy. There are other frameworks too. Use something like AI.JSX for building JavaScript apps and components. BotSharp is a framework for .NET devs building conversational apps with LLMs. Hugging Face has frameworks that help you abstract the LLM, including Transformers.js and agents.js.

There’s no shortage of these types of frameworks. If you’re iterating through LLMs and want consistent code regardless of which model you use, these are good choices.

Create with low-code tools when available

If I had an idea for a generative AI app, I’d want to figure out how much I actually had to build myself. There are a LOT of tools for building entire apps, components, or widgets, and many require very little coding.

Everyone’s in this game. Zapier has some cool integration flows. Gradio lets you expose models and APIs as web pages. Langflow got snapped up by DataStax, but still offers a way to create AI apps without much required coding. Flowise offers some nice tooling for orchestration or AI agents. Microsoft’s Power Platform is useful for low-code AI app builders. AWS is in the game now with Amazon Bedrock Agents. ServiceNow is baking generative AI into their builder tools, Salesforce is doing their thing, and basically every traditional low-code app vendor is playing along. See OutSystems, Mendix, and everyone else.

As you would imagine, Google does a fair bit here as well. The Vertex AI Agent Builder offers four different app types that you basically build through point-and-click. These include personalized search engines, chat, recommendation engine, and connected agents.

Search apps can tap into a variety of data sources including crawled websites, data warehouses, relational databases, and more.

What’s fairly new is the “agent app” so let’s try building one of those. Specifically, let’s say I run a baseball clinic (sigh, someday) and help people tune their swing in our batting cages. I might want a chat experience for those looking for help with swing mechanics, and then also offer the ability to book time in the batting cage. I need data, but also interactivity.

Before building the AI app, I need a Cloud Function that returns available times for the batting cage.

This Node.js function returns an array of book-able timeslots. I’ve hard-coded the data, but you get the idea.

I also jumped into the Google Cloud IAM interface to ensure that the Dialogflow service account (which the AI agent operates as) has permission to invoke the serverless function.

Let’s build the agent. Back in the Vertex AI Agent Builder interface, I choose “new app” and pick “agent.”

Now I’m dropped into the agent builder interface. On the left, I have navigation for agents, tools, test cases, and more. In the next column, I set the goal of the agent, the instructions, and any tools I want to use with the agent. On the right, I preview my agent.

I set a goal of “Answer questions about baseball and let people book time in the batting cage” and then get to the instructions. There’s a “sample” set of instructions that are useful for getting started. I used those, but removed references to other agents or tools, as we don’t have that yet.

But now I want to add a tool, as I need a way to show available booking times if the user asks. I have a choice of adding a data store—this is useful if you want to source Q&A from a BigQuery table, crawl a website, or get data from an API. I clicked the “manage all tools” button and chose to add a new tool. Here I give the tool a name, and very importantly, a description. This description is used by the AI agent to figure out when to invoke it.

Because I chose OpenAPI as the tool type, I need to provide an OpenAPI spec for my Cloud Function. There’s a sample provided, and I used that to put together my spec. Note that the URL is the function’s base URL, and the path contains the specific function name.
```
{
    "openapi": "3.0.0",
    "info": {
        "title": "Cage API",
        "version": "1.0.0"
    },
    "servers": [
        {
            "url": "https://us-central1-seroter-anthos.cloudfunctions.net"
        }
    ],
    "paths": {
        "/function-get-cage-times": {
            "get": {
                "summary": "List all open cage times",
                "operationId": "getCageTimes",
                "responses": {
                    "200": {
                        "description": "An array of cage times",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "type": "array",
                                    "items": {
                                        "$ref": "#/components/schemas/CageTimes"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "CageTimes": {
                "type": "object",
                "required": [
                    "cageNumber",
                    "openSlot",
                    "cageType"
                ],
                "properties": {
                    "cageNumber": {
                        "type": "integer",
                        "format": "int64"
                    },
                    "openSlot": {
                        "type": "string"
                    },
                    "cageType": {
                        "type": "string"
                    }
                }
            }
        }
    }
}
```
Finally, in this “tool setup” I define the authentication to that API. I chose “service agent token” and because I’m calling a specific instance of a service (versus the platform APIs), I picked “ID token.”

After saving the tool, I go back to the agent definition and want to update the instructions to invoke the tool. I use the syntax, and appreciated the auto-completion help.

Let’s see if it works. I went to the right-hand preview pane and asked it a generic baseball question. Good. Then I asked it for open times in the batting cage. Look at that! It didn’t just return a blob of JSON; it parsed the result and worded it well.

Very cool. There are some quirks with this tool, but it’s early, and I like where it’s going. This was MUCH simpler than me building a RAG-style or function-calling solution by hand.

Summary

The AI assistance and model building products get a lot of attention, but some of the most interesting work is happening in the tools for AI app builders. Whether you’re experimenting with prompts, coding up a solution, or assembling an app out of pre-built components, it’s a fun time to be developer. What products, tools, or frameworks did I miss from my assessment?
April 30, 2024
How I’d use generative AI to modernize an app

I’m skeptical of anything that claims to make difficult things “easy.” Easy is relative. What’s simple for you might draw blood from me. And in my experience, when a product claims to make something “easy”, it’s talking about simplifying a subset of the broader, more complicated job-to-be-done.

So I won’t sit here and tell you that generative AI makes app modernization easy. Nothing does. It’s hard work and is as much about technology as it is psychology and archeology. But AI can make it easier. We’ll take any help we can get, right? I count at least five ways I’d use generative AI to make smarter progress on my modernization journey.

#1 Understand the codebase

Have you been handed a pile of code and scripts before? Told to make sense of it and introduce some sort of feature enhancement? You might spend hours, days, or weeks figuring out the relationships between components and side effects of any changes.

Generative AI is fairly helpful here. Especially now that things like Gemini 1.5 (with its 1 million token input) exist.

I might use something like Gemini (or ChatGPT, or whatever) to ask questions about the code base and get ideas for how something might be used. This is where the “generative” part is handy. When I use the Duet AI assistance in to explain SQL in BigQuery, I get back a creative answer about possible uses for the resulting data.

In your IDE, you might use Duet AI (or Copilot, Replit, Tabnine) to give detailed explanations of individual code files, shell scripts, YAML, or Dockerfiles. Even if you don’t decide to use any generative AI tools to write code, consider using them to explain it.

#2 Incorporate new language/framework features

Languages themselves modernize at a fairly rapid pace. Does your codebase rely on a pattern that was rad back in 2011? It happens. I’ve seen that generative AI is a handy way to modernize the code itself while teaching us how to apply the latest language features.

For instance, Go generics are fairly new. If your Go app is more than 2 years old, it wouldn’t be using them. I could go into my Go app and ask my generative AI chat tool for advice on how to introduce generics to my existing code.

Usefully, the Duet AI tooling also explains what it did, and why it matters.

I might use the same types of tools to convert an old ASP.NET MVC app to the newer Minimal APIs structure. Or replace deprecated features from Spring Boot 3.0 with more modern alternatives. Look at generative AI tools as a way to bring your codebase into the current era of language features.

#3 Improve code quality

Part of modernizing an app may involve adding real test coverage. You’ll never continuously deploy an app if you can’t get reliable builds. And you won’t get reliable builds without good tests and a CI system.

AI-assisted developer tools make it easier to add integration tests to your code. I can go into my Spring Boot app and get testing scaffolding for my existing functions.

Consider using generative AI tools to help with broader tasks like defining an app-wide test suite. You can use these AI interfaces to brainstorm ideas, get testing templates, or even generate test data.

In addition to test-related activities, you can use generative AI to check for security issues. These tools don’t care about your feelings; here, it’s calling out my terrible practices.

Fortunately, I can also ask the tool to “fix” the code. You might find a few ways to use generative AI to help you refactor and improve the resilience and quality of the codebase.

#4 Swap out old or unsupported components

A big part of modernization is ensuring that a system is running fully supported components. Maybe that database, plugin, library, or entire framework is now retired, or people don’t want to work with it. AI tools can help with this conversion.

For instance, maybe it’s time to swap out JavaScript frameworks. That app you built in 2014 with Backbone.js or jQuery is feeling creaky. You want to bring in React or Angular instead. I’ve had some luck coaxing generative AI tools into giving me working versions of just that. Even if you use AI chat tools to walk you through the steps (versus converting all the code), it’s a time-saver.

The same may apply to upgrades from Java 8 to Java 21, or going from classic .NET Framework to modern .NET. Heck, you can even have some luck switching from COBOL to Go. I wouldn’t blindly trust these tools to convert code; audit aggressively and ensure you understand the new codebase. But these tools may jump start your work and cut out some of the toil.

#5 Upgrade the architecture

Sometimes an app modernization requires some open-heart surgery. It’s not about light refactoring or swapping a frontend framework. No, there are times where you’re yanking out major pieces or making material changes.

I’ve had some positive experiences asking generative AI tools to help me upgrade a SOAP service to REST. Or REST to gRPC. You might use these tools to switch from a stored procedure-heavy system to one that puts the logic into code components instead. Speaking of databases, you could change from MySQL to Cloud Spanner, or even change a non-relational database dependency back to a relational one. Will generative AI do all the work? Probably not, but much of it’s pretty good.

This might be a time to make bigger changes like swapping from one cloud to another, or adding a major layer of infrastructure-as-code templates to your system. I’ve seen good results from generative AI tools here too. In some cases, a modernization project is your chance to introduce real, lasting changes to a architecture. Don’t waste the opportunity!

Wrap Up

Generative AI won’t eliminate the work of modernizing an app. There’s lots of work to do to understand, transform, document, and rollout code. AI tools can make a big difference, though, and you’re tying a hand behind your back if you ignore it! What other uses for app modernization come to mind?

February 21, 2024

Make Any Catalog-Driven App More Personalized to Your Users: How I used Generative AI Coding Tools to Improve a Go App With Gemini.

How many chatbots do we really need? While chatbots are a terrific example app for generative AI use cases, I’ve been thinking about how developers may roll generative AI into existing “boring” apps and make them better.

As I finished all my Christmas shopping—much of it online—I thought about all the digital storefronts and how they provide recommended items based on my buying patterns, but serve up the same static item descriptions, regardless of who I am. We see the same situation with real estate listings, online restaurant menus, travel packages, or most any catalog of items! What if generative AI could create a personalized story for each item instead? Wouldn’t that create such a different shopping experience?

Maybe this is actually a terrible idea, but during the Christmas break, I wanted to code an app from scratch using nothing but Google Cloud’s Duet AI while trying out our terrific Gemini LLM, and this seemed like a fun use case.

The final app (and codebase)

The app shows three types of catalogs and offers two different personas with different interests. Everything here is written in Go and uses local files for “databases” so that it’s completely self-contained. And all the images are AI-generated from Google’s Imagen2 model.

When the user clicks on a particular catalog entry, the go to a “details” page where the generic product summary from the overview page is sent along with a description of the user’s preferences to the Google Gemini model to get a personalized, AI-powered product summary.

That’s all there is to it, but I think it demonstrates the idea.

How it works

Let’s look at what we’ve got here. Here’s the basic flow of the AI-augmented catalog request.

How did I build the app itself (GitHub repo here)? My goal was to only use LLM-based guidance either within the IDE using Duet AI in Google Cloud, or burst out to Bard where needed. No internet searches, no docs allowed.

I started at the very beginning with a basic prompt.

What are the CLI commands to create a new Go project locally?

The answer offered the correct steps for getting the project rolling.

The next commands are where AI assistance made a huge difference for me. With this series of natural language prompts in the Duet AI chat within VS Code, I got the foundation of this app set up in about five minutes. This would have easily taken me 5 or 10x longer if I did it manually.

Give me a main.go file that responds to a GET request by reading records from a local JSON file called property.json and passes the results to an existing html/template named home.html. The record should be defined in a struct with fields for ID, Name, Description, and ImageUrl.

Create an html/template for my Go app that uses Bootstrap for styling, and loops through records. For each loop, create a box with a thin border, an image at the top, and text below that. The first piece of text is "title" and is a header. Below that is a short description of the item. Ensure that there's room for four boxes in a single row.

Give me an example data.json that works with this struct

Add a second function to the class that responds to HTML requests for details for a given record. Accept a record id in the querystring and retrieve just that record from the array before sending to a different html/template

With these few prompts, I had 75% of my app completed. Wild! I took this baseline, and extended it. The final result has folders for data, personas, images, a couple HTML files, and a single main.go file.

Let’s look at the main.go file, and I’ll highlight a handful of noteworthy bits.

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"html/template"
	"log"
	"net/http"
	"os"
	"strconv"

	"github.com/google/generative-ai-go/genai"
	"google.golang.org/api/option"
)

// Define a struct to hold the data from your JSON file
type Record struct {
	ID          int
	Name        string
	Description string
	ImageURL    string
}

type UserPref struct {
	Name        string
	Preferences string
}

func main() {

	// Parse the HTML templates
	tmpl := template.Must(template.ParseFiles("home.html", "details.html"))

	//return the home page
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {

		var recordType string
		var recordDataFile string
		var personId string

		//if a post-back from a change in record type or persona
		if r.Method == "POST" {
			// Handle POST request:
			err := r.ParseForm()
			if err != nil {
				http.Error(w, "Error parsing form data", http.StatusInternalServerError)
				return
			}

			// Extract values from POST data
			recordType = r.FormValue("recordtype")
			recordDataFile = "data/" + recordType + ".json"
			personId = r.FormValue("person")

		} else {
			// Handle GET request (or other methods):
			// Load default values
			recordType = "property"
			recordDataFile = "data/property.json"
			personId = "person1" // Or any other default person
		}

		// Parse the JSON file
		data, err := os.ReadFile(recordDataFile)
		if err != nil {
			fmt.Println("Error reading JSON file:", err)
			return
		}

		var records []Record
		err = json.Unmarshal(data, &records)
		if err != nil {
			fmt.Println("Error unmarshaling JSON:", err)
			return
		}

		// Execute the template and send the results to the browser
		err = tmpl.ExecuteTemplate(w, "home.html", struct {
			RecordType string
			Records    []Record
			Person     string
		}{
			RecordType: recordType,
			Records:    records,
			Person:     personId,
		})
		if err != nil {
			fmt.Println("Error executing template:", err)
		}
	})

	//returns the details page using AI assistance
	http.HandleFunc("/details", func(w http.ResponseWriter, r *http.Request) {

		id, err := strconv.Atoi(r.URL.Query().Get("id"))
		if err != nil {
			fmt.Println("Error parsing ID:", err)
			// Handle the error appropriately (e.g., redirect to error page)
			return
		}

		// Extract values from querystring data
		recordType := r.URL.Query().Get("recordtype")
		recordDataFile := "data/" + recordType + ".json"

		//declare recordtype map and extract selected entry
		typeMap := make(map[string]string)
		typeMap["property"] = "Create an improved home listing description that's seven sentences long and oriented towards a a person with these preferences:"
		typeMap["store"] = "Create an updated paragraph-long summary of this store item that's colored by these preferences:"
		typeMap["restaurant"] = "Create a two sentence summary for this menu item that factors in one or two of these preferences:"
		//get the preamble for the chosen record type
		aiPremble := typeMap[recordType]

		// Parse the JSON file
		data, err := os.ReadFile(recordDataFile)
		if err != nil {
			fmt.Println("Error reading JSON file:", err)
			return
		}

		var records []Record
		err = json.Unmarshal(data, &records)
		if err != nil {
			fmt.Println("Error unmarshaling JSON:", err)
			return
		}

		// Find the record with the matching ID
		var record Record
		for _, rec := range records {
			if rec.ID == id { // Assuming your struct has an "ID" field
				record = rec
				break
			}
		}

		if record.ID == 0 { // Record not found
			// Handle the error appropriately (e.g., redirect to error page)
			return
		}

		//get a reference to the persona
		person := "personas/" + (r.URL.Query().Get("person") + ".json")

		//retrieve preference data from file name matching person variable value
		preferenceData, err := os.ReadFile(person)
		if err != nil {
			fmt.Println("Error reading JSON file:", err)
			return
		}
		//unmarshal the preferenceData response into an UserPref struct
		var userpref UserPref
		err = json.Unmarshal(preferenceData, &userpref)
		if err != nil {
			fmt.Println("Error unmarshaling JSON:", err)
			return
		}

		//improve the message using Gemini
		ctx := context.Background()
		// Access your API key as an environment variable (see "Set up your API key" above)
		client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("GEMINI_API_KEY")))
		if err != nil {
			log.Fatal(err)
		}
		defer client.Close()

		// For text-only input, use the gemini-pro model
		model := client.GenerativeModel("gemini-pro")
		resp, err := model.GenerateContent(ctx, genai.Text(aiPremble+" "+userpref.Preferences+". "+record.Description))
		if err != nil {
			log.Fatal(err)
		}

		//parse the response from Gemini
		bs, _ := json.Marshal(resp.Candidates[0].Content.Parts[0])
		record.Description = string(bs)

		//execute the template, and pass in the record
		err = tmpl.ExecuteTemplate(w, "details.html", record)
		if err != nil {
			fmt.Println("Error executing template:", err)
		}
	})

	fmt.Println("Server listening on port 8080")
	fs := http.FileServer(http.Dir("./images"))
	http.Handle("/images/", http.StripPrefix("/images/", fs))
	http.ListenAndServe(":8080", nil)
}

I do not write great Go code, but it compiles, which is good enough for me!

On line 13, see that I refer to the Go package for interacting with the Gemini model. All you need is an API key, and we have a generous free tier.

On line 53, notice that I’m loading the data file based on the type of record picked on the HTML template.

On line 79, I’m executing the HTML template and sending the type of record (e.g. property, restaurant, store), the records themselves, and the persona.

On lines 108-113, I’m storing a map of prompt values to use for each type of record. These aren’t terrific, and could be written better to get smarter results, but it’ll do.

Notice on line 147 that I’m grabbing the user preferences we use for customization.

On line 163, I create a Gemini client so that I can interact with the LLM.

On line 171, see that I’m generating AI content based on the record-specific preamble, the record details, and the user preference data.

On line 177, notice that I’m extracting the payload from Gemini’s response.

Finally, on line 181 I’m executing the “details” template and passing in the AI-augmented record.

None of this is rocket science, and you can check out the whole project on GitHub.

What an “enterprise” version might look like

What I have here is a local example app. How would I make this more production grade?

Store catalog images in an object storage service. All my product images shouldn’t be local, of course. They belong in something like Google Cloud Storage.
Add catalog items and user preferences to a database. Likewise, JSON files aren’t a great database. The various items should all be in a relational database.
Write better prompts for the LLM. My prompts into Gemini are meh. You can run this yourself and see that I get some silly responses, like personalizing the message for a pillow by mentioning sporting events. In reality, I’d write smarter prompts that ensured the responding personalized item summary was entirely relevant.
Use Vertex AI APIs for accessing Gemini. Google AI Studio is terrific. For production scenarios, I’d use the Gemini models hosted in full-fledged MLOps platform like Vertex AI.
Run app in a proper cloud service. If I were really building this app, I’d host it in something like Google Cloud Run, or maybe GKE if it was part of a more complex set of components.
Explore whether pre-generating AI-augmented results and caching them would be more performant. It’s probably not realistic to call LLM endpoints on each “details” page. Maybe I’d pre-warm certain responses, or come up with other ways to not do everything on the fly.

This exercise helped me see the value of AI-assisted developer tooling firsthand. And, it feels like there’s something useful about LLM summarization being applied to a variety of “boring” app scenarios. What do you think?

January 4, 2024

Would generative AI have made me a better software architect? Probably.

Much has been written—some by me—about how generative AI and large language models help developers. While that’s true, there are plenty of tech roles that stand to get a boost from AI assistance. I sometimes describe myself as a “recovering architect” when referring back to my six years in enterprise IT as a solutions/functional architect. It’s not easy being an architect. You lead with influence not authority, you’re often part of small architecture teams and working solo on projects, and tech teams can be skeptical of the value you add. When I look at what’s possible with generative AI today, I think about how I would have used it to be better at the architecture function. As an architect, I’d have used it in the following ways:

Help stay up-to-date on technology trends

It’s not hard for architects to get stale on their technical knowledge. Plenty of other responsibilities take architects away from hands-on learning. I once worked with a smart architect who was years removed from coding. He was flabbergasted that our project team was doing client-side JavaScript and was certain that server-side logic was the only way to go. He missed the JavaScript revolution and as a result, the team was skeptical of his future recommendations.

If you have an Internet-connected generative AI experience, you can start with that to explore modern trends in tech. I say “internet-connected” because if you’re using a model trained and frozen at a point in time, it won’t “know” about anything that happened after it’s training period.

For example, I might ask a service like Google Bard for help understanding the current landscape for server-side JavaScript.

I could imagine regularly using generative AI to do research, or engaging in back-and-forth discussion to upgrade my dated knowledge about a topic.

Assess weaknesses in my architectures

Architects are famous (infamous?) for their focus on the non-functional requirements of a system. You know, the “-ilities” like scalability, usability, reliability, extensibility, operability, and dozens of others.

While no substitute for your own experience and knowledge, an LLM can offer a perspective on the quality attributes of your architecture.

For example, I could take one of the architectures from the Google Cloud Jump Start Solutions. These are high-quality reference apps that you deploy to Google Cloud with a single click. Let’s look at the 3-tier web app, for example.

It’s a very solid architecture. I can take this diagram, send it to Google Bard, and ask how it measures up against core quality attributes I care about.

What came back from Bard were sections for each quality attribute, and a handful of recommendations. With better prompting, I could get even more useful data back! Whether you’re a new architect or an experienced one, I’d bet that this offers some fresh perspectives that would validate or challenge your own assumptions.

Validate architectures against corporate specifications

Through fine-tuning, retrieval augmented generation, or simply good prompting, you can give LLMs context about your specific environment. As an architect, I’d want to factor in my architecture standards into any evaluation.

In this example, I give Bard some more context about corporate standards when assessing the above architecture diagram.

In my experience, architecture is local. Each company has different standards, choices of foundational technologies, and strategic goals. Asking LLMs for generic architecture advice is helpful, but not sufficient. Feeding your context into a model is critical.

Build prototypes to hand over to engineers

Good architects regularly escape their ivory tower and stay close to the builders. And ideally, you’re bringing new ideas, and maybe even working code, to the teams you support.

Services like Bard help me create frontend web pages without any work on my part. And I can quickly prototype with cloud services or open source software thanks to AI-assisted coding tools. Instead of handing over whiteboard sketches or UML diagrams, we can hand over rudimentary working apps.

Help me write sections of my architecture or design specs

Don’t outsource any of the serious thinking that goes into your design docs or architecture specs. But that doesn’t mean you can’t get help on boilerplate content. What if I have various sections for “background info” in my docs, and want to include tech assessments?

I used the new “help me write” feature in Google Docs to summarize the current state of Java and call out popular web frameworks. This might be good for bolstering an architecture decision to choose a particular framework.

Quickly generating templates or content blocks may prove a very useful job for generative AI.

Bootstrap new architectural standards

In addition to helping you write design docs, generative AI may help you lay a foundation for new architecture standards. Plenty of architects write SOPs or usage standards, and I would have used LLMs to make my life easier.

Here, I once again asked the “help me write” capability in Google Docs to give me the baseline of a new spec for database selection in the enterprise. I get back a useful foundation to build upon.

Summarize docs or notes to pull out key decisions

Architects can tend to be … verbose. That’s ok. The new Duet AI in Workspace does a good job summarizing long docs or extracting insights. I would have loved to use this on the 30-50 page architecture specs or design docs I used to work with! Readers could have quickly gotten the gist of the doc, or found the handful of decisions that mattered most. Architects will get plenty of value from this.

A good architect is worth their weight in gold right now. Software systems have never been more powerful, complicated, and important. Good architecture can accelerate a company or sink it. But the role of the architect is evolving, and generative AI can give architects new ways. to create, assess, and communicate. Start experimenting now!

October 16, 2023

Category: AI/ML

Build knowledge, plans, and prototypes with Gemini

Build apps and agents with Gemini

Build AI and data systems with Gemini

Build a better day-2 experience with Gemini

Build supporting digital assets with Gemini

#1. I’ll start most efforts by asking “can AI help with this?”

#2. I’m going to do much better research.

#3. I will learn new things faster.

#4. I’ll less time debating and more time coding.

#5. I will ask better questions.

#6. I want to identify where the manual struggle is actually the point.

#7. I should create certain types of content much faster.

#8. I’m going to free up some valuable time.

Step #0 – Prerequisites and considerations

Step #1 – Create the source repo

Step #2 – Add a reference to the source repo

Step #3 – Add a Gemini Code Assist customization index

Step #4 – Use updated coding suggestions

Fine-tune your model

Use RAG

Ground the results

Train on the fly

Grade Homework in Vertex AI Studio

Grade Homework Through Code

Grade Homework in Cloud Workflows

Building an AI code generator

Workflow step: Set up repo and pipeline

Workflow step: Write prompts

Workflow step: Test locally

Workflow step: Generate app in pipeline

Workflow step: Upload artifact

Workflow step: Deploy and run app

Wrap up

Find a sandbox for experimenting with prompts

Consume native APIs when working with specific models or platforms

Use meta-frameworks for consistent experiences across models and providers

Create with low-code tools when available

Summary

#1 Understand the codebase

#2 Incorporate new language/framework features

#3 Improve code quality

#4 Swap out old or unsupported components

#5 Upgrade the architecture

Wrap Up

The final app (and codebase)

How it works

What an “enterprise” version might look like

Help stay up-to-date on technology trends

Assess weaknesses in my architectures

Validate architectures against corporate specifications

Build prototypes to hand over to engineers

Help me write sections of my architecture or design specs

Bootstrap new architectural standards

Summarize docs or notes to pull out key decisions