Richard Seroter's Architecture Musings

Author: Richard Seroter

Daily Reading List – November 22, 2024 (#447)

It’s been a good week, even though I’ve been under the weather. I’ve got a chance to rest up over this weekend and most of next week. I hope you also take a chance to recharge!

[article] American Airlines is deploying new tech to shame boarding line cutters. Fully endorsed. Now if we can also somehow electrocute people who don’t return shopping carts, I’ll be thrilled.

[article] In The Context Of Long Context. Here’s a wonderful post from Steven (who’s part of the NotebookLM effort) about the magic and importance of long context for LLMs.

[article] The case against living in the Bay Area, for ambitious tech people. Forrest brings the heat, and receipts in this great post.

[article] Auditors blast Pentagon over insecure, “antiquated” IT systems. Even when high-profile audits result in embarrassing findings, it’s hard to modernize the offending legacy systems. I can’t imagine how hard it is when the findings aren’t on the front page.

[blog] Leveling Up Fuzzing: Finding more vulnerabilities with AI. Very cool. We’re finding software vulnerabilities in open source projects by using AI to generate fuzz tests.

[blog] What’s new in Spring Modulith 1.3? I’ve been keeping an eye on this “modular monolith” framework from the Spring team. It’s making good progress.

[article] DHH Wants To Make Web Dev Easy Again, With Ruby on Rails. Lots of hot takes in here, but you can always count on some pointed perspectives from DHH.

[article] New York Times and AWS dispel three mainframe to cloud migration myths. Oh, so moving from mainframe to the cloud doesn’t HAVE to impact staffing, performance, or reliability? Great to see.

[blog] Never rewrite code? I’m giving you all sorts of whiplash opinions today. Derek says to think twice before rewriting a system from scratch.

[blog] Boost your Continuous Delivery pipeline with Generative AI. Here’s a neat example of using AI to generate commentary and release notes for every source code change.

[blog] Building a User Signals Platform at Airbnb. I always enjoy these types of posts where companies go deep into their architectures. This one looks at a stream processing system.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 22, 2024
Daily Reading List – November 21, 2024 (#446)

Flying home after a good week in Sunnyvale. I’m excited about what we’re building ,but I’m more about excited about who is building and how we build.

[blog] Automating Bulk Assessments for Cloud Migration with Google’s DMA. If you’re migrating to the cloud, you probably have more than one database to consider. This post looks at a tool and approach to assess databases at scale.

[blog] Scaling wearable foundation models. A look at wearables, consumer health data, and model training.

[article] Global cloud spend to surpass $700B in 2025 as hybrid adoption spreads: Gartner. Cloud spending is up, and AI doesn’t hurt.

[blog] Learn to build and run AI powered apps at Firebase Demo Day ‘24. There’s a lot happening within this mobile and web hosting service. This post embeds short videos that show you how to realize some AI use cases.

[article] Google’s Gemini chatbot now has memory. It remembers. I mean, not in a creepy way. But if you want our Gemini agent to use past info to make future suggestions, you got it.

[article] What are top social drivers of performance in engineering teams? Are you creating an environment where your software engineers can thrive, and what predicts success? This post shares some research.

[article] The 2024 State of Platform Engineering? Fledgling at Best. Jennifer looks at a recent survey and explains the progression towards platform engineering, and why we have a ways to go.

[article] Structured rollout boosts Copilot adoption and satisfaction by 20%. Short post, but a good reminder that you can’t just hand your team a set of tools; invest in training and thoughtful rollouts.

[blog] Why configuration drift is so hard to avoid in practice. You may have adopted infrastructure as code, but Brian points out that it’s unlikely that you’ll funnel every change through that pipeline.

[blog] Announcing the Winners of the Gemini API Developer Competition! Very cool apps here, all showing a creativity that’s inspiring.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 21, 2024
Daily Reading List – November 20, 2024 (#445)

Had another good day in Sunnyvale today and I’m enjoying these chances to plan what, and how, we work in 2025. The industry keeps moving, even if I’m in all-day meetings, and there’s some good updates below.

[article] Using AI to design landing pages. This advice (and technique) could apply to both internal or external landing pages.

[blog] Top 10 Things Developers Want from their AI Code Assistants in 2024. A lot has changed in the year since Kate first wrote this post. This is a great list of what’s relevant right now.

[blog] Enhancing observability in complex IT infrastructures with Google Cloud Logging. I don’t come across much about built-in Cloud observability tools, so this one caught my eye. Good overview.

[podcast] Your Cloud IAM Top Pet Peeves (and How to Fix Them). Listen to this episode of the Cloud Security podcast to go deeper on identity and access management, and where things go sideways.

[article] The product-led growth product manager’s toolkit: Empowering product-led success. If you’re getting your head around PLG, this is a useful piece to read.

[blog] Control your Generative AI costs with the Gemini API’s context caching. This is a powerful function that has come to other models too. Learn when and how to use context caching.

[blog] How we built Google Meet’s adaptive audio feature. I assumed this was just a terrible fact of life. But no, it doesn’t HAVE to sound like death when people join a conference call from the same room.

[blog] Why Spring AI: The Seamless Path to Generative AI. Don’t sleep on Spring AI as a framework that stands to get traction. There’s a lot of Spring Boot out there, and this looks like a solid option for Java devs.

[blog] Google Axion is a game-changer — let me show you why. This Arm-based CPU is a big deal, and this detailed post explains the architecture and performance.

[article] Gartner forecast dampens cloud repatriation outlook. Workloads will move around, but I haven’t met many (any?) industry leaders who think exiting the cloud gives them an advantage.

[blog] New Cassandra to Spanner adapter simplifies Yahoo’s migration journey. Plenty of products or components come about because of a specific customer’s need. This one helped Yahoo! make their database transition.

[blog] Announcing new updates to Cloud Translation AI, now covering 189 languages. If you’re translating content for global audiences, or those in a region with multiple languages being spoken, check this out.

[article] WebAssembly Won’t Replace Docker Anytime Soon: Docker CTO. Agreed. The tooling isn’t there yet, and the use cases don’t seem mainstream enough to move folks off containers.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 20, 2024
Daily Reading List – November 19, 2024 (#444)

Today you’ll find a mix of newsy items, in addition to some long-form pieces. I think you’ll like one or two of these!

[blog] Use AI to build AI: Save time on prompt design with AI-powered prompt writing. It’s useful to learn basic prompting techniques. But also take advantage of platforms that help you do it.

[blog] Stateful workload operator: stateful systems on Kubernetes at LinkedIn. The team at LinkedIn set up a system on Kubernetes that makes it easier for app owners to deploy and run stateful systems.

[blog] Playground Wisdom: Threads Beat Async/Await. Big, interesting post about blocking code, async processing, and why threads are better than the async/await abstraction.

[blog] Meet Angular v19. Big, big release of this popular frontend framework. Dig through this to see what matters, and what’s new/improved.

[blog] There Is Only One Key Difference Between Observability 1.0 and 2.0. Another banger from Charity. She does an excellent job explaining what to look for in modern observability products.

[blog] Batch prediction in Gemini. You want predictions for dozens, hundreds, or thousands of items in your store, portfolio, or account? Mete has a clear post that shows how to submit batches of requests to an LLM.

[article] p-Hacking your A/B tests. You might be making a mistake when running A/B tests for your app, marketing campaigns, or whatever. Jason says you shouldn’t stop a test once it looks like you got a “conclusive” result.

[article] How To Track DORA Metrics in an Internal Developer Portal. Are you capturing software delivery performance metrics? How? Where? Here’s an example with one product, but extrapolate to whatever tool you use.

[blog] Data extraction: The many ways to get LLMs to spit JSON content. If you’re parsing text with an LLM and want structured output, you have options. Guillaume shows off a few, and lands on a clean choice.

[blog] Scaling Ambient In Your Sleep. Quick post, but good to read if you skipped Istio the first time around because of too much complexity or too many scaling concerns.

[article] Microsoft brings together its enterprise AI offerings in the Azure AI Foundry. I’m keeping an eye on what our friends at Microsoft are up to at Ignite. With data products too.

[blog] AWS Lambda SnapStart for Python and .NET functions is now generally available. This looks like a cool feature from AWS. We offer quick startup with a CPU boost, but this does some neat things with memory. Also, a post about ten years of Lambda.

[blog] Cloud CISO Perspectives: The high security cost of legacy tech. Even if you aren’t chasing functional benefits from modernization, you probably crave a better security posture.

[article] When Your Manager Is Ineffective — and You Feel Stuck. There are good action items here to help you address a real (or perceived) lack of usefulness from your boss.

[blog] A Step-by-Step Guide to Fine-Tuning Gemini for Question Answering. I liked this post which explained some different types of fine tuning, and shows a clear example.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 19, 2024
Daily Reading List – November 18, 2024 (#443)

I’m in Sunnyvale for the week. Today was a product leadership offsite, and I had fun talking to our senior product leaders. Enjoy the reading list below.

[blog] Data loading best practices for AI/ML inference on GKE. How do you smartly load up the large weights/parameters of an LLM so that the inference server starts up faster? This post has good advice.

[blog] Platform Engineering as a Service. Does platform engineering bring order to the chaos that DevOps creates? Tyler thinks so, and offers up a solution.

[report] CNCF Technology Landscape Radar. If you’re a platform builder who is looking for open tech to bake into your stack, look at this new CNCF report that explores sentiment about some leading projects.

[blog] AlloyDB Omni supercharges performance: Faster transactions, analytics, and vector search. If you’re running PostgreSQL anywhere—on your laptop, in a local data center, in any public cloud—AlloyDB is turning into an excellent option. Great performance, and AI ready.

[article] What Comes After Open Source? Bruce Perens Has Some Ideas. Does something come after open source? This proposal is focused on sustainable funding for contributors by “taxing” large users.

[blog] Self-contained Executable Programs with Deno Compile. This makes it easy to turn your server-side JavaScript apps into a binary that runs anywhere.

[article] How To Stop Procrastinating: 4 Secrets From Research. This sat in my reading queue for a week before I got to it, which feels ironic. But some helpful advice here!

[blog] Meta Prompt Guard. Nervous about people tricking your AI agents into doing bad things? This post explores a lightweight classifier model from Meta that looks for malicious input.

[article] Let’s End Toxic Productivity. We have more conveniences than our grandparents did, but also seem to work more hours. Our standard of living is higher, as are our expectations, but still. This deep dive looks at how to step back and not overdo everything.

[blog] Empower your teams with self-service Kubernetes using GKE fleets and Argo CD. Nick and team show off a new Argo CD plugin that keeps your fleet of Kubernetes cluster synced and deployment ready.

[blog] Gemini in Firebase for Data Connect queries. Here’s a good use case for LLMs. Firebase is using natural language prompts to generate GraphQL instead of you needing to know the full syntax.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 18, 2024
Customizing AI coding suggestions using the *best* code, not just *my* code
The ability to use your own codebase to customize the suggestions from an AI coding assist is a big deal. This feature—available in products like Gemini Code Assist, GitHub Copilot, and Tabnine—gives developers coding standards, data objects, error messages, and method signatures that they recognize from previous projects. Data shows that the acceptance rate for AI coding assistants goes way up when devs get back trusted results that look familiar. But I don’t just want up-to-date and familiar code that *I* wrote. How can I make sure my AI coding assistant gives me the freshest and best code possible? I used code customization in Gemini Code Assist to reference Google Cloud’s official code sample repos and now I get AI suggestions that feature the latest Cloud service updates and best practices for my preferred programming languages. Let me show you how I did it.

Last month, I showed how to use local codebase awareness in Gemini Code Assist (along with its 128,000 input token window) to “train” the model on the fly using code samples or docs that an LLM hasn’t been trained on yet. It’s a cool pattern, but also requires upfront understanding of what problem you want to solve, and work to stash examples into your code repo. Can I skip both steps?

Yes, Gemini Code Assist Enterprise is now available and I can point to existing code repos in GitHub or GitLab. When I reference a code repo, Google Cloud automatically crawls it, chunks it up, and stores it (encrypted) in a vector database within a dedicated project in my Google Cloud environment. Then, the Gemini Code Assist plugin uses that data as part of a RAG pattern when I ask for coding suggestions. By pointing at Google Cloud’s code sample repos—any best practice repo would apply here—I supercharge my recommendations with data the base LLM doesn’t have (or prioritize).

Step #0 – Prerequisites and considerations

Code customization is an “enterprise” feature of Gemini Code Assist, so it requires a subscription to that tier of service. There’s a promotional $19-per-month price until March of 2025, so tell your boss to get moving.

Also, this is currently available in US, European, and Asian regions, you may need to request geature access via a form (depending on when you read this), and today it works with GitHub.com and GitLab.com repos, although on-premises indexing is forthcoming. Good? Good. Let’s keep going.

Step #1 – Create the source repo

One wrinkle here is that you need to own the repos you ask Gemini Code Assist to index. You can’t just point at any random repo to index. Deal breaker? Nope.

I can just fork an existing repo into my own account! For example, here’s the Go samples repo from Google Cloud, and the Java one. Each one is stuffed with hundreds of coding examples for interacting with most of Google Cloud’s services. These repos are updated multiple times per week to ensure they include support for all the latest Cloud service features.

I went ahead and forked each repo in GitHub. You can do it via the CLI or in the web console.

I didn’t overthink it and kept the repository name the same.

Gemini Code Assist can index up to 950 repos (and more if really needed), so you could liberally refer to best-practice repos that will help your developers write better code.

Any time I want to refresh my fork to grab the latest code sample updates, I can do so.

Step #2 – Add a reference to the source repo

Now I needed to reference these repos for later code customization. Google Cloud Developer Connect is a service that maintains connections to source code sitting outside Google Cloud.

I started by choosing GitHub.com as my source code environment.

Then I named my Developer Connect connection.

Then I installed a GitHub app into my GitHub account. This app is what enables the loading of source data into the customization service. From here, I chose the specific repos that I wanted available to Developer Connect.

When finished, I had one of my own repos, and two best practice repos all added to Developer Connect.

That’s it! Now to point these linked repos to Gemini Code Assist.

Step #3 – Add a Gemini Code Assist customization index

I had just two CLI commands to execute.

First, I created a code customization index. You’ve got one index per Cloud project (although you can request more) and you create it with one command.

Next, I created a repository group for the index. You use these to control access to repos, and could have different ones for different dev audiences. Here’s where you actually point to a given repo that has the Developer Connect app installed.

I ran this command a few times to ensure that each of my three repos was added to the repository group (and index).

Indexing can take up to 24 hours, so here’s where you wait. After a day, I saw that all my target repos were successfully indexed.

Whenever I sync the fork with the latest updates to code samples, Gemini Code Assist will index the updated code automatically. And my IDE with Gemini Code Assist will have the freshest suggestions from our samples repo!

Step #4 – Use updated coding suggestions

Let’s prove that this worked.

I looked for a recent commit to the Go samples repos that the base Gemini Code Assist LLM wouldn’t know about yet. Here’s one that has new topic-creation parameters for our Managed Kafka service. I gave the prompt below to Gemini Code Assist. First, I used a project and account that was NOT tied to the code customization index.
```
//function to create a topic in Google Cloud Managed Kafka and include parameters for setting replicationfactor and partitioncount
```
The coding suggestion was good, but incomplete as it was missing the extra configs the service can now accept.

When I went to my Code Assist environment that did have code customization turned on, you see that the same prompt gave me a result that mirrored the latest Go sample code.

I tried a handful of Java and Go prompts, and I regularly (admittedly, not always) got back exactly what I wanted. Good prompt engineering might have helped me reach 100%, but I still appreciated the big increase in quality results. It was amazing to have hundreds of up-to-date Google-tested code samples to enrich my AI-provided suggestions!

AI coding assistants that offer code customization from your own repos are a difference maker. But don’t stop at your own code. Index other great code repos that represent the coding standards and fresh content your developers need!
November 18, 2024
Daily Reading List – November 15, 2024 (#442)

Happy Friday. I’ve got lots of links again for you today! Enjoy your weekend and see y’all next week.

[blog] The 5 Cs: Configuring access to backing services. How do you set up configurations between app code and its database? Brian looks at what is needed, and wonders if there’s a better way.

[blog] Inference with Gemma using Dataflow and vLLM. I learned a least a half dozen things from reading this post. Cool look at what it takes to use an LLM in a streaming pipeline.

[article] IT leaders reshape 2025 spending around AI despite cost concerns. It’s time to invest ahead of returns, and smart folks get that. But, it’s also important to chase realistic returns!

[article] “Reducing Complexity”. John makes great points here about how we use “complexity” as a shorthand for a lot of different problems.

[blog] Announcing .NET 9. I’m admittedly not doing much with C# right now, but these language updates are still a big milestone. I’m sure devs will pick up this version quickly.

[article] Why designing landing pages is hard. It is hard. Know who you’re targeting, and just accept that for pages with a wide audience, you can’t please everyone.

[blog] RAG and Long-Context Windows: Why You need Both. Have a few tools at your disposal. This post also links to a long-context contest that’s still open.

[blog] Generative epiphany. Analogies don’t have to be perfect to be helpful. I like how Katie used ideas from the containerization world to grok LLMs.

[blog] Spring Boot and Temporal. Sometimes we feel like pioneers as we navigate the mashup of technologies. Cornelia goes through an exploration to get this workflow engine to play with a Java Spring Boot app.

[blog] How developers spend the time they save thanks to AI coding tools. Here’s some new data from GitHub that shows where devs are applying their AI-provided free time.

[blog] You’re not as loosely coupled as you think! Quick post, but Derek offers a useful reminder about the multiple types of coupling you’ll find in your architecture.

[blog] How to deploy Llama 3.2-1B-Instruct model with Google Cloud Run GPU. It’s getting less and less intimidating to work with LLMs. Here, you can quickly deploy a model to our serverless runtime.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 15, 2024
Daily Reading List – November 14, 2024 (#441)

It’s probably just me, or confirmation bias, but I feel like I’ve seen an uptick in deep content lately. Each day I’m finding blog posts and articles that explore topics in fresh ways. Lovin’ it.

[article] 5 Non-LLM Software Trends To Be Excited About. LLM hysteria may drown out other news, but there always lots of interesting things happening. Good look here.

[blog] Winning Hearts and Minds. Transforming to a product culture? I liked Marty’s list of patterns you should aim to realize.

[blog] Data loading best practices for AI/ML inference on GKE. Really good post here about speeding up the time it takes to load models/weights for inference.

[article] Amazon CEO Jassy Makes Bizarre Claim About Crushing Microsoft, Google on GenAI Innovation. “Crushing” is doing a lot of work in that sentence. But hey, this cloud competition has been great for developers, regardless of how much AWS is playing catchup!

[blog] Netflix’s Distributed Counter Abstraction. Do you want a couple thousand words on building an accurate counting service? OF COURSE YOU DO. This is an excellent post from the Netflix team, and its level of detail is an example of why I love tech blogs.

[article] Qwen2.5-Coder just changed the game for AI programming—and it’s free. There’s terrific AI work happening in China, as I learned this year on my trip there.

[article] What are the latest benchmarks for the “DORA four” key metrics? Very good analysis here of the latest research into software delivery performance. Skim through to see what performance level your company is in.

[article] 3 Ways to Make Sure High Performers Feel Valued. Managers might assume that all high performers know their crushing it, and valued. That’s definitely not the case.

[blog] The Cold, Hard Truth About Your Cloud DR Strategy. Corey reminds us about the tricky/interesting parts of backup, and the reality of restore.

[article] TIOBE Programming Index News November 2024: Go Reaches Its Highest-Ever Ranking. Something happening in Go land. We’re seeing some massive consumption numbers on our cloud side, and users seem to be embracing Go as a primary language.

[blog] Becoming an AI-Assisted Coding Convert. We each have different relationships with our dev tools and our preferred workflows. Aja found peace with using AI assistance in a way that worked for her.

[article] Will OpenTofu Dethrone Terraform in IaC? Not any time soon. But there’s shakiness in some foundational areas (IaC, IDEs, databases like Redis) that we haven’t seen in a long time.

[article] Build GenAI Prototypes with Streamlit. if you’re getting into building AI apps atop LLMs, you could do worse than starting with Streamlit to build prototypes.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 14, 2024
Daily Reading List – November 13, 2024 (#440)

Another day of interesting content. There’s a fair bit of AI, but plenty of other topics as well.

[blog] Our Machine Learning Crash Course goes in depth on generative AI. This is a remarkable free resource. It’s a self-study course taken by millions, and now refreshed with the latest AI advances. I might take it myself!

[blog] 65,000 nodes and counting: Google Kubernetes Engine is ready for trillion-parameter AI models. Wow. Just wow. This record-breaking cluster size is only possible because of the uniquely awesome platform foundation in Google Cloud.

[blog] Introducing enhanced Dataproc Serverless runtimes for a streamlined ML development experience. We’ve added a fresh set of libraries to our serverless Spark service so that there’s less for you to do.

[blog] Deployment-Driven Development. Tyler says that we should change our thinking. Instead of treating deployments as a destination, it should be prioritized from the start.

[article] Employee AI adoption cools globally. Temporarily. This survey from Slack shows that folks don’t want to be perceived by management as lazy for using AI. We can fix that with better education of the management ranks, and better sponsorship of AI initiatives from exec teams.

[blog] Computer Use with Anthropic’s Newest Model on Vertex AI. I’m familiar with this cool Anthropic feature, but Nikita helps us really see how to use it.

[blog] Virtual Personas for Language Models via an Anthology of Backstories. The AI research team at Berkeley proposes an interesting idea by giving LLMs a backstory.

[blog] Why Developers Are Unresponsive to Traditional Marketing. Adam gives some candid, necessary truth to those who are trying to coax developers into their universe.

[article] Memory in Agent Systems. This is a good topic to learn about. With all this talk about agents, how do we think about durable storage of “memory”?

[article] Google DeepMind open-sources AlphaFold 3, ushering in a new era for drug discovery and molecular biology. A lot of our work is for commercial gain, and I make no apologies for it. But I love how much we do that’s just about discovery and advancing ideas.

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 13, 2024
Daily Reading List – November 12, 2024 (#439)

There was so much thought-provoking content today! I enjoyed the pieces on refining strategy, giving feedback, running local LLMs, shipping products, and more.

[blog] Using systems modeling to refine strategy. how do you work through an engineering strategy? Will proposes using the ideas behind systems modeling and explains how to do it.

[article] High Performers Need Feedback, Too. Can we give more constructive feedback to our best teammates? Yes. This article has some examples of how to improve the quality of your feedback.

[blog] LLM Guard and Vertex AI. Mete has been looking at evaluation frameworks and this one is a security-related one. Make sure that you’ve got frameworks like this in mind as you start legitimately using LLMs.

[blog] Inside the Coming AI Market “Supercycle” and How Cloud Startups Can Benefit: The Battery Ventures 2024 State of OpenCloud Report. There’s some interesting analysis here, but if anything, skim through just to see the logos for vendors in the space.

[blog] All the ways to scrape Prometheus metrics in Google Cloud. Leonid shows how to grab monitoring metrics from VMs, Kubernetes clusters, and serverless apps.

[blog] Everything I’ve learned so far about running local LLMs. Great post where Chris explores the tech and ideas around LLMs in a way that will make you want to try it too.

[blog] 5 Real-World Examples of Great API Error Messages. Why are they great? These include lots of useful info to understand what went wrong and offers help on fixing it.

[blog] Generating zero-shot personalized portraits. Need a new headshot? This post from Google Research shows off a new AI model that can transform selfies into different artistic styles.

[blog] Adopting Bazel for Web at Scale. Lots of details here from Airbnb about how they moved from bespoke scripts for their builds to a single tool.

[blog] How I ship projects at big tech companies. Somewhat cynical, but entirely reasonable perspective about what it takes to get a product out the door, and have it considered “shipped.”

Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

Type your email…

November 12, 2024