Author: Richard Seroter

  • More than serverless: Why Cloud Run should be your first choice for any new web app.

    More than serverless: Why Cloud Run should be your first choice for any new web app.

    I’ll admit it, I’m a PaaS guy. Platform-as-a-Service is an ideal abstraction for those that don’t get joy from fiddling with infrastructure. From Google App Engine, to Heroku, to Cloud Foundry, I’ve appreciated attempts to deliver runtimes that makes it easier to ship and run code. Classic PaaS-type services were great at what they did. The problem with all of them—this includes all the first generation serverless products like Amazon Lambda—were that they were limited. Some of the necessary compromises were well-meaning and even healthy: build 12-factor apps, create loose coupling, write less code and orchestrate manage services instead. But in the end, all these platforms, while successful in various ways, were too constrained to take on a majority of apps for a majority of people. Times have changed.

    Google Cloud Run started as a serverless product, but it’s more of an application platform at this point. It’s reminiscent of a PaaS, but much better. While not perfect for everything—don’t bring Windows apps, always-on background components, or giant middleware—it’s becoming my starting point for nearly every web app I build. There are ten reasons why Cloud Run isn’t limited by PaaS-t constraints, is suitable for devs at every skill level, and can run almost any web app.

    1. It’s for functions AND apps.
    2. You can run old AND new apps.
    3. Use by itself AND as part of a full cloud solution.
    4. Choose simple AND sophisticated configurations.
    5. Create public AND private services.
    6. Scale to zero AND scale to 1.
    7. Do one-off deploys AND set up continuous delivery pipelines.
    8. Own aspects of security AND offload responsibility.
    9. Treat as post-build target AND as upfront platform choice.
    10. Rely on built-in SLOs, logs, metrics AND use your own observability tools.

    Let’s get to it.

    #1. It’s for functions AND apps.

    Note that Cloud Run also has “jobs” for run-to-completion batch work. I’m focusing solely on Cloud Run web services here.

    I like “functions.” Write short code blocks that respond to events, and perform an isolated piece of work. There are many great uses cases for this.

    The new Cloud Run functions experience makes it easy to bang out a function in minutes. It’s baked into CLI and UI. Once I decide to create a function ….

    I only need to pick a service name, region, language runtime, and whether access to this function is authenticated or not.

    Then, I see a browser-based editor where I can write, test, and deploy my function. Simple, and something most of us equate with “serverless.”

    But there’s more. Cloud Run does apps too. That means instead of a few standalone functions to serve a rich REST endpoint, you’re deploying one Spring Boot app with all the requisite listeners. Instead of serving out a static site, you could return a full web app with server-side capabilities. You’ve got nearly endless possibilities when you can serve any container that accepts HTTP, HTTP/2, WebSockets, or gRPC traffic.

    Use either abstraction, but stay above the infrastructure and ship quickly.

    Docs:Deploy container images, Deploy functions, Using gRPC, Invoke with an HTTPS request
    Code labs to try:Hello Cloud Run with Python, Getting Started with Cloud Run functions

    #2. You can run old AND new apps.

    This is where the power of containers shows up, and why many previous attempts at PaaS didn’t break through. It’s ok if a platform only supports new architectures and new apps. But then you’re accepting that you’ll need an additional stack for EVERYTHING ELSE.

    Cloud Run is a great choice because you don’t HAVE to start fresh to use it. Deploy from source in an existing GitHub repo or from cloned code on your machine. Maybe you’ve got an existing Next.js app sitting around that you want to deploy to Cloud Run. Run a headless CMS. Does your old app require local volume mounts for NFS file shares? Easy to do. Heck, I took a silly app I built 4 1/2 years ago, deployed it from the Docker Hub, and it just worked.

    Of course, Cloud Run shines when you’re building new apps. Especially when you want fast experimentation with new paradigms. With its new GPU support, Cloud Run lets you do things like serve LLMs via tools like Ollama. Or deploy generative AI apps based on LangChain or Firebase Genkit. Build powerful web apps in Go, Java, Python, .NET, and more. Cloud Run’s clean developer experience and simple workflow makes it ideal for whatever you’re building next.

    Docs:Migrate an existing web service, Optimize Java applications for Cloud Run, Supported runtime base images, Run LLM inference on Cloud Run GPUs with Ollama
    Code labs to try:How to deploy all the JavaScript frameworks to Cloud Run, Django CMS on Cloud Run, How to run LLM inference on Cloud Run GPUs with vLLM and the OpenAI Python SDK

    #3. Use by itself AND as part of a full cloud solution.

    There aren’t many tech products that everyone seems to like. But folks seem to really like Cloud Run, and it regularly wins over the Hacker News crowd! Some classic PaaS solutions were lifestyle choices; you had to be all in. Use the platform and its whole way of working. Powerful, but limiting.

    You can choose to use Cloud Run all by itself. It’s got a generous free tier, doesn’t require complicated HTTP gateways or routers to configure, and won’t force you to use a bunch of other Google Cloud services. Call out to databases hosted elsewhere, respond to webhooks from SaaS platforms, or just serve up static sites. Use Cloud Run, and Cloud Run alone, and be happy.

    And of course, you can use it along with other great cloud services. Tack on a Firestore database for a flexible storage option. Add a Memorystore caching layer. Take advantage of our global load balancer. Call models hosted in Vertex AI. If you’re using Cloud Run as part of an event-driven architecture, you might also use built-in connections to Eventarc to trigger Cloud Run services when interesting things happen in your account—think file uploaded to object storage, user role deleted, database backup completes.

    Use it by itself or “with the cloud”, but either way, there’s value.

    Docs:Hosting webhooks targets, Connect to a Firestore database, Invoke services from Workflows
    Code labs to try:How to use Cloud Run functions and Gemini to summarize a text file uploaded to a Cloud Storage bucket

    #4. Choose simple AND sophisticated configurations.

    One reason PaaS-like services are so beloved is because they often provide a simple onramp without requiring tons of configuration. “cf push” to get an app to Cloud Foundry. Easy! Getting an app to Cloud Run is simple too. If you have a container, it’s a single command:

    rseroter$ gcloud run deploy go-app --image=gcr.io/seroter-project-base/go-restapi

    If all you have is source code, it’s also a single command:

    rseroter$ gcloud run deploy node-app --source .

    In both cases, the CLI asks me to pick a region and whether I want requests authenticated, and that’s it. Seconds later, my app is running.

    This works because Cloud Run sets a series of smart, reasonable default settings.

    But sometimes you do want more control over service configuration, and Cloud Run opens up dozens of possible settings. What kind of sophisticated settings do you have control over?

    • CPU allocation. Do you want CPU to be always on, or quit when idle?
    • Ingress controls. Do you want VPC-only access or public access?
    • Multi-container services. Add a sidecar.
    • Container port. The default is 8080, but set to whatever you want.
    • Memory. The default value is 512 MiB per instance, but you can go up to 32GB.
    • CPU. It defaults to 1, but you can go less than 1, or up to 8.
    • Healthchecks. Define startup or liveliness checks that ping specific endpoints on a schedule.
    • Variables and secrets. Define environment variables that get injected at runtime. Same with secrets that get mounted at runtime.
    • Persistent storage volumes. There’s ephemeral scratch storage in every Cloud Run instance, but you can also mount volumes from Cloud Storage buckets or NFS shares.
    • Request timeout. The default value is 5 minutes, but you can go up to 60 minutes.
    • Max concurrency. A given service instance can handle more than one request. The default value is 80, but you can go up to 1000!
    • and much more!

    You can do something simple, you can do something sophisticated, or a bit of both.

    Docs:Configure container health checks, Maximum concurrent requests per instance, CPU allocation, Configure secrets, Deploying multiple containers to a service (sidecars)
    Code labs to try:How to use Ollama as a sidecar with Cloud Run GPUs and Open WebUI as a frontend ingress container

    #5. Create public AND private services.

    One of the challenge with early PaaS services was that they were just sitting on the public internet. That’s no good as you get to serious, internal-facing systems.

    First off, Cloud Run services are public by default. You control the authentication level (anonymous access, or authenticated user) and need to explicitly set that. But the service itself is publicly reachable. What’s great is that this doesn’t require you to set up any weird gateways or load balancers to make it work. As soon as you deploy a service, you get a reachable address.

    Awesome! Very easy. But what if you want to lock things down? This isn’t difficult either.

    Cloud Run lets me specify that I’ll only accept traffic from my VPC networks. I can also choose to securely send messages to IPs within a VPC. This comes into play as well if you’re routing requests to a private on-premises network peered with a cloud VPC. We even just added support for adding Cloud Run services to a service mesh for more networking flexibility. All of this gives you a lot of control to create truly private services.

    Docs:Private networking and Cloud Run, Restrict network ingress for Cloud Run, Cloud Service Mesh
    Code labs to try:How to configure a Cloud Run service to access an internal Cloud Run service using direct VPC egress, Configure a Cloud Run service to access both an internal Cloud Run service and public Internet

    #6. Scale to zero AND scale to 1.

    I don’t necessarily believe that cloud is more expensive than on-premises—regardless of some well-publicized stories—but keeping idle cloud services running isn’t helping your cost posture.

    Google Cloud Run truly scales to zero. If nothing is happening, nothing is running (or costing you anything). However, when you need to scale, Cloud Run scales quickly. Like, a-thousand-instances-in-seconds quickly. This is great for bursty workloads that don’t have a consistent usage pattern.

    But you probably want the option to have an affordable way to keep a consistent pool of compute online to handle a steady stream of requests. No problem. Set the minimum instance to 1 (or 2, or 10) and keep instances warm. And, set concurrency high for apps that can handle it.

    If you don’t have CPU always allocated, but keep a minimum instance online, we actually charge you significantly less for that “warm” instance. And you can apply committed use discounts when you know you’ll have a service running for a while.

    Run bursty workloads or steadily-used workloads all in a single platform.

    Docs:About instance autoscaling in Cloud Run services, Set minimum instances, Load testing best practices
    Code labs to try:Cloud Run service with minimum instances

    #7. Do one-off deploys AND set up continuous delivery pipelines.

    I mentioned above that it’s easy to use a single command or single screen to get an app to Cloud Run. Go from source code or container to running app in seconds. And you don’t have to set up any other routing middleware or Cloud networking to get a routable serivce.

    Sometimes you just want to do a one-off deploy without all the ceremony. Run the CLI, use the Console UI, and get on with life. Amazing.

    But if that was your only option, you’d feel constrained. So you can use something like GitHub Actions to deploy to Cloud Run. Most major CI/CD products support it.

    Another great option is Google Cloud Deploy. This managed service takes container artifacts and deploys them to Google Kubernetes Engine or Google Cloud Run. It offers some sophisticated controls for canary deploys, parallel deploys, post-deploy hooks, and more.

    Cloud Deploy has built-in support for Cloud Run. A basic pipeline (defined in YAML, but also configured via point-and-click in the UI if you want) might show three stages for dev, test, and prod.

    When the pipeline completes, we see three separate Cloud Run instances deployed, representing each stage of the pipeline.

    You want something more sophisticated? Ok. Cloud Deploy supports Cloud Run canary deployments. You’d use this if you want a subset of traffic to go to the new instance before deciding to cut over fully.

    This is taking advantage of Cloud Run’s built-in traffic management feature. When I check the deployed service, I see that after advancing my pipeline to 75% of production traffic for the new app version, the traffic settings are properly set in Cloud Run.

    Serving traffic in multiple regions? Cloud Deploy makes it possible to ship a release to dozens of places simultaneously. Here’s a multi-target pipeline. The production stage deploys to multiple Cloud Run regions in the US.

    When I checked Cloud Run, I saw instances in all the target regions. Very cool!

    If you want a simple deploy, do that with the CLI or UI. Nothing stops you. However, if you’re aiming for a more robust deployment strategy, Cloud Run readily handles it through services like Cloud Deploy.

    Docs:Use a canary deployment strategy, Deploy to multiple targets at the same time, Deploying container images to Cloud Run
    Code labs to try:How to Deploy a Gemini-powered chat app on Cloud Run, How to automatically deploy your changes from GitHub to Cloud Run using Cloud Build

    #8. Own aspects of security AND offload responsibility.

    On reason that you choose managed compute platforms is to outsource operational tasks. It doesn’t mean you’re not capable of patching infrastructure, scaling compute nodes, or securing workloads. It means you don’t want to, and there are better uses of your time.

    With Cloud Run, you can drive aspects of your security posture, and also let Cloud Run handle key aspects on your behalf.

    What are you responsible for? You choose an authentication approach, including public or private services. This includes control of how you want to authenticate developers who use Cloud Run. You can authenticate end users, internal or external ones, using a handful of supported methods.

    It’s also up to you to decide which service account the Cloud Service instance should impersonate. This controls what a given instance has access to. If you want to ensure that only containers with verified provenance get deployed, you can also choose to turn on Binary Authorization.

    So what are you offloading to Cloud Run and Google Cloud?

    You can outsource protection from DDoS and other threats by turning on Cloud Armor. The underlying infrastructure beneath Cloud Run is completely managed, so you don’t need to worry about upgrading or patching any of that. What’s also awesome is that if you deploy Cloud Run services from source, you can sign up for automatic base image updates. This means we’ll patch the OS and runtime of your containers. Importantly, it’s still up to you to patch your app dependencies. But this is still very valuable!

    Docs:Security design overview, Introduction to service identity, Use Binary Authorization. Configure automatic base image updates
    Code labs to try:How to configure a Cloud Run service to access an internal Cloud Run service using direct VPC egress, How to connect a Node.js application on Cloud Run to a Cloud SQL for PostgreSQL database

    #9. Treat as post-build target AND as upfront platform choice.

    You might just want a compute host for your finished app. You don’t want to have to pick that host up front, and just want a way to run your app. Fair enough! There aren’t “Cloud Run apps”; they’re just containers. That said, there are general tips that make an app more suitable for Cloud Run than not. But the key is, for modern apps, you can often choose to treat Cloud Run as a post-build decision.

    Or, you can design with Cloud Run in mind. Maybe you want to trigger Cloud Run based on a specific Eventarc event. Or you want to capitalize on Cloud Run concurrency so you code accordingly. You could choose to build based on a specific integration provided by Cloud Run (e.g. Memorystore, Firestore, or Firebase Hosting).

    There are times that you build with the target platform in mind. In other cases, you want a general purpose host. Cloud Run is suitable for either situation, which makes it feel unique to me.

    Docs:Optimize Java applications for Cloud Run, Integrate with Google Cloud products in Cloud Run, Trigger with events
    Code labs to try:Trigger Cloud Run with Eventarc events

    #10. Rely on built-in SLOs, logs, metrics AND use your own observability tools.

    If you want it to be, Cloud Run can feel like an all-in-one solution. Do everything from one place. That’s how classic PaaS was, and there was value in having a tightly-integrated experience. From within Cloud Run, you have built-in access to logs, metrics, and even setting up SLOs.

    The metrics experience is powered by Cloud Monitoring. I can customize event types, the dashboards, time window, and more. This even includes the ability to set uptime checks which periodically ping your service and let you know if everything is ok.

    The embedded logging experience is powered by Cloud Logging and gives you a view into all your system and custom logs.

    We’ve even added an SLO capability where you can define SLIs based on availability, latency, or custom metrics. Then you set up service level objectives for service performance.

    While all these integrations are terrific, you don’t have to only use this. You can feed metrics and logs into Datadog. Same with Dynatrace. You can also write out OpenTelemetry metrics or Prometheus metrics and consume those how you want.

    Docs:Monitor Health and Performance, Logging and viewing logs in Cloud Run, Using distributed tracing

    Kubernetes, virtual machines, and bare metal boxes all play a key role for many workloads. But you also may want to start with the highest abstraction possible so that you can focus on apps, not infrastructure. IMHO, Google Cloud Run is the best around and satisfies the needs of most any modern web app. Give it a try!

  • Daily Reading List – September 6, 2024 (#392)

    It was officially a 4-day workweek, but felt like a regular week. Lots going on, and plenty of things to do. But I greatly prefer that to the alternative! Have a great weekend, y’all.

    [blog] Serving Stable Diffusion with RayServe on GKE Autopilot. How would you make this text-to-image model available to other apps in your environment? William gives us a step by step for getting it going on Kubernetes.

    [blog] Coaching Feedback. I’m familiar with the SHARE model for giving feedback, but don’t always remember to use it. This is a good reminder to break it out more often.

    [blog] Google named a leader in the Forrester Wave: AI/ML Platforms, Q3 2024. You like us, you really like us. It’s cool to see Google as the only hyperscale cloud in the leader section.

    [blog] Getting 🍨 Ice Cream 🍦 Recommendations at Scale with Gemini, Embeddings, and Vector Search. Alok really likes ice cream. He’s also great at AI/ML and helps us understand the role of embeddings in creating a recommendation engine.

    [article] Cycle Time. Most of you are trying to shrink the time it takes to go from idea to working software in production. But what activity starts the “cycle time” clock? And when is the software considered “shipped”?

    [blog] Securing Generative AI: Defending Against Prompt Injection. I thought this was good advice for a problem most of us hadn’t even thought much about yet.

    [guide] Enterprise application with Oracle Database on Compute Engine. There’s more going on than just AI stuff. Here’s a good new guide on hosting a highly available app that depends on Oracle databases. All on VMs.

    [blog] Gemma explained: PaliGemma architecture. This is an open vision-language model that produces a text response from image or text input.

    [article] InfoQ AI, ML and Data Engineering Trends Report – September 2024. What’s “late majority” versus “early adopters” in this fast moving space? Here’s one lens on it.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – September 5, 2024 (#391)

    It was a good day. I had productive meetings, one epiphany, and a chance to write. In the reading list below, you’ll find some tech deep dives, but also a few pieces that’ll help you with strategic thinking.

    [blog] Using Node-based pricing on GKE Autopilot. Fully managed Kubernetes is a good deal. William talks about the couple of ways (pod based, node based) to pay, and how the new Custom Compute Class gives you a very flexible way to define workload priorities.

    [article] TikTok Releases Tool to Improve Monorepo Performance. Google famously has a monorepo, but there’s work to be done to make it usable for every developer. This article explores a new tool that helps devs pull subsets of files.

    [blog] Building LLMs from the Ground Up: A 3-hour Coding Workshop. Folks are learning AI from lots of places, including YouTube. Great video here from Sebastian.

    [blog] Looks Matter (When It Comes to Software Products). Good product design isn’t a nice-to-have; it’s critical for long term success.

    [blog] Best practices for cost-efficient Kafka clusters. Lots of details here, whether you’re self-hosting or using a managed environment for event stream processing.

    [blog] AI Security frameworks in a nutshell — Part 1. Sita does a great job looking into the industry and government-led frameworks that matter for security folks. Also check out part 2.

    [blog] From Idea to Reality: Building the Instant Web with Gemini (Part 1). How can you go from wireframe to website with the help of an LLM? Thu Ya has a good example of the iterative process.

    [blog] Why “AI” projects fail. The amusing rant here claims that AI projects fail because folks “do AI” to avoid the harder work of identifying and fixing real problems.

    [blog] Telemetry in Go 1.23 and beyond. I like the transparency and insights provided by the Go team related to user-provided telemetry about Go usage.

    [blog] How to consistently output JSON with the Gemini API using controlled generation. Gemini was among the first LLMs to offered structured output in JSON, and this post explains why it matters, and how to use it.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – September 4, 2024 (#390)

    I can’t come up with any interesting intro today, so I asked Gemini for a joke about open source software. “Why did the open source software go to therapy? It had to many unresolved issues.” AI isn’t taking my job any time soon.

    [article] New LLM Pre-training and Post-training Paradigms. What sorts of pre-training and post-training is available to LLMs? And how do leading open models employ (or not employ) these approaches? Great writeup.

    [guide] Select a managed container runtime environment. Which type of managed compute service makes sense for your next app? This new architecture guide may help you decide.

    [article] What’s Behind Elastic’s Unexpected Return to Open Source? More on this somewhat-surprising move to make Elasticsearch more open again.

    [blog] What are the most common bugs in LLM-generated code? It’s good to see and digest this. And it reinforces my belief that you should know how to code before depending too heavily on these AI assisted tools.

    [article] Why We Shouldn’t Romanticize Failure. Ah, maybe we shouldn’t be so quick to crave a “fail fast” and “celebrate failure” culture? It sounds like we over-estimate our resilience.

    [blog] BigQuery and Anthropic’s Claude: A powerful combination for data-driven insights. There’s some nice integration here between a great LLM and a terrific analytics platform.

    [article] What to Do When You Know More Than Your Boss. You should know more than your boss in many areas. This is an article about knowledge sharing.

    [blog]. A retryable JUnit 5 extension for flaky tests. If you’re starting to invoke LLMs in your apps, you might want to rethink your testing strategy. Guillaume wanted retry-able tests to account for non-deterministic responses.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – September 3, 2024 (#389)

    I had a good 3-day weekend and will now struggle all week to remember what day it actually is. Today’s reading list offers some controversy (“founder mode!”), survey data (“Python users!”), and a little intrigue (“web3 heists!”).

    [blog] Measuring meaningful availability / uptime of Wise. How does availability differ from uptime, and how does this fintech company look at reporting those values? Educational read.

    [blog] Founder Mode. This one had the socials buzzing over the weekend, with lots of contrasting takes. I liked it. “Manager mode” versus “founder mode” can even apply within existing companies; you see the creators of teams or divisions choose either path.

    [blog] Mastering Controlled Generation with Gemini 1.5: Schema Adherence for Developers. You want a blueprint for how responses come back from an LLM? Gemini has offered JSON mode for a while, and this lets you define a schema for the LLM response.

    [blog] Svelte adoption guide: Overview, examples, and alternatives. I hear good things about this frontend framework, and this is a big post with tons of details.

    [blog] Your ultimate guide to the latest in generative AI on Vertex AI. You don’t need to follow every announcement in tech. I mean, you’re reading this daily post, so you’re probably fairly up to date. But, I like these sorts of recap blogs that give you a single place to catch up.

    [site] Python Developers Survey 2023 Results. Some fresh survey results here which convey insights into developer choices in frameworks, tools, clouds, and more.

    [blog] DeFied Expectations — Examining Web3 Heists. I know there are books about this topic, but wow. There’s a lot of money and a lot of creative attacks at play.

    [paper] Generative Verifiers: Reward Modeling as Next-Token Prediction. New paper from DeepMind that looks at training models as verifiers of LLM responses.

    [blog] Flutter Vs React Native : Performance Benchmarks you can’t miss ! 🔥⚡️ [Part -1]. I haven’t seen many benchmarks like this, and it’s useful to see where each framework lines up on head-to-head comparisons.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – August 30, 2024 (#388)

    If you’re in the US, there’s a three-day weekend in front of you. Enjoy it!

    [article] How to Craft a Memorable Message, According to Science. We forget most of what we hear or come across. Can you ensure folks don’t forget what you tell them? Read this.

    [blog] Debate over “open source AI” term brings new push to formalize definition. It’s good to have an agreed-upon definition of open source AI, as folks are using that term to refer to models that don’t fit the traditional open source definition.

    [article] What a day in the life of a Technical Writer in the energy industry looks like — Guest post by Bonnie Denham. I see our tech writers in action every day, but this is a good look at the types of activities the role might entail.

    [blog] Feature Flags are more than just Toggles. There are many ways to implement feature flags, and Derek encourages us to think more broadly than just conditional statements in code.

    [blog] Google Cloud launches Memorystore for Valkey, a 100% open-source key-value service. Life finds a way. When previously open products get license changes, new options emerge. Valkey is a solid alternative to Redis, and I’m glad we’re offering it as a managed service.

    [blog] Top Five Platform Engineering Books for 2024. Getting your team to think about a platform engineering approach? These books can put you in the right frame of mind.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – August 29, 2024 (#387)

    Big reading list today! It includes some tech dives, inspiring text-to-image AI examples, and some strong opinions about JavaScript frameworks and software estimation.

    [blog] What is the Kubernetes “Claim” model? File this under “things you don’t HAVE to know, but are useful nuggets to store away.” Brian provides context into what these “requests” into Kubernetes mean.

    [blog] Long context prompting tips. Small tips from Anthropic, but potentially very impactful ones.

    [article] How Platform Engineering Enables the 10,000-Dev Workforce. Big post, but lots of good coverage of this topic. Why do platform engineering? How do you measure impact?

    [article] Generative AI coding startup Magic lands $320M investment from Eric Schmidt, Atlassian and others. AI coding tools are so hot right now! Magic raised a ton. And announced they are training models on Google Cloud.

    [blog] A Java Language Cumulative Feature Rollup. If you haven’t checked out Java since version 8, you’ll like this recap of everything important that’s happened since then.

    [blog] Gemma explained: RecurrentGemma architecture. This isn’t the “standard” LLM architecture, and is worth reading about.

    [blog] Building Out 🍨 Ice Cream 🍦 Product Assets at Scale with Gemini. Come up with creative product descriptions and summarize reviews with AI. Fun demo.

    [article] Is Your Organizational Transformation Veering Off Course? How do you navigate those turning points to get things back on track? There’s good advice here.

    [blog] Get more photorealistic with Imagen 3. I’m still wow-ed by AI-powered image generation, especially those of living creatures. This post shows some remarkable results.

    [blog] A developer’s guide to getting started with Imagen 3 on Vertex AI. Here’s some useful advice on prompting text to image models like Imagen 3.

    [article] Developers Rail Against JavaScript ‘Merchants of Complexity’. The use of frameworks is a topic that can spark wildly different opinions. This piece shares skepticism of the value.

    [blog] Software estimates have never worked and never will. It’s always been “confident guessing”, especially for any estimate longer than 2 days. DHH says to look at “budgets” instead.

    [blog] Elasticsearch is Open Source, Again. I think this is the first of the “switched our open license to closed” vendors to actually go back to something more open. Kudos!

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – August 28, 2024 (#386)

    I paid for yesterday’s light-meeting day with a heavy-meeting day today. Well-played, calendar gods. But I also read some great content, and even had time for some quick demos about AI-generated data insights and attached volumes on a serverless app.

    [blog] New in Gemini: Custom Gems and improved image generation with Imagen 3. Here’s a good update for those that want personalized assistants, or some premium image generation.

    [blog] Gemini Chat App. Simon used Claude to write a small app that uses our latest Gemini 1.5 model versions. He also opines that people who don’t see value in using AI assistance for programming are missing something.

    [article] Why Cynics Are Less Likely to Succeed. It’s not hard to be cynical, but operating in a mode of trust and cooperation is not just good for your mental well-being, it’s better for your career.

    [blog] What is the Open Source Alternative to CockroachDB? When license changes happen, other vendors/projects jump in. Denis at Yugabyte offers up a case for using their database as a drop-in replacement.

    [blog] Building an AI-Powered CLI with Golang and Google Gemini. I’m definitely seeing more organic usage of the Gemini models. I guess that’s what happens when “quality models” meets “wildly generous free tier.”

    [blog] Managing Angular. This is a high level view from the product lead for the popular JavaScript framework. OSS management is quite the job, whether you’re solo or working inside a big tech firm.

    [article] Speak, Code, Deploy: What if voice was your primary tool for coding? I dunno. I’m hoping speech-to-text and chatbots are transient interfaces with AI. At least for the masses that don’t need that for accessibility reasons. I personally don’t want to talk to my computer, or be stuck “chatting” to get my work done.

    [blog] Get started with the new generally available features of Gemini in BigQuery. *This* is how I want my AI. Melted into the products I use. These are great features for smarter analytics.

    [article] Applying AI to the SDLC: New Ideas and Gotchas! – Leveraging AI to Improve Software Engineering. Good talk and transcript for those considering a more delivery-wide view of AI assistance in their software teams.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • Daily Reading List – August 27, 2024 (#385)

    I had a very light meeting day today, which messed with my head. But, it was great to answer all my email, write a blog post, do some research, and work on upcoming presentations.

    [blog] Routines and habit stacking. Tom looks at incorporating goals into current routines, and piggybacking on existing success.

    [blog] Level up your codebase with Gemini’s long context window in Vertex AI. I love this example from Karl. He shows us exactly how to take a large codebase and use Gemini to send prompts like “provide a getting started guide” and “implement this feature.”

    [article] Does Market Share Still Matter? Do “market leaders” have the most efficiency, market power, and quality? Or do highly digital firms have similar profitability to the market leaders? Interesting research.

    [article] Profitable on day one! What does it even mean to be “profitable? Jason encourages us to use the term correctly.

    [blog] A Year of Project IDX. If you haven’t checked this out, at least give it a scan. IDX is an interesting developer environment, and I’ve used it to build a few apps.

    [blog] What conditions make developers thrive most? This looks at recent research about where devs don’t just perform, but thrive. Four key dimensions come into play.

    [blog] Friction Logs. This is about the process of using products, recording the experience and papercuts that come with it, and sending that feedback to those who fix it.

    [article] Why AI can’t spell ‘strawberry’. Such an interesting problem! I tried this scenario with the latest Gemini Flash models we released today, and it did indeed answer correctly.

    [blog] How DoorDash is pushing experimentation boundaries with interleaving designs. Sophisticated stuff, but looks like a useful strategy for getting better signals earlier.

    Want to get this update sent to you every day? Subscribe to my RSS feed or subscribe via email below:

  • 4 ways to pay down tech debt by ruthlessly removing stuff from your architecture

    4 ways to pay down tech debt by ruthlessly removing stuff from your architecture

    What advice do you get if you’re lugging around a lot of financial debt? Many folks will tell you to start purging expenses. Stop eating out at restaurants, go down to one family car, cancel streaming subscriptions, and sell unnecessary luxuries. For some reason, I don’t see the same aggressive advice when it comes to technical debt. I hear soft language around “optimization” or “management” versus assertive stances that take a meat cleaver to your architectural excesses.

    What is architectural debt? I’m thinking about bloated software portfolios where you’re carrying eight products in every category. Brittle automation that only partially works and still requires manual workarounds and black magic. Unique customizations to packaged software that’s now keeping you from being able to upgrade to modern versions. Also half-finished “ivory tower” designs where the complex distributed system isn’t fully in place, and may never be. You might have too much coupling, too little coupling, unsupported frameworks, and all sorts of things that make deployments slow, maintenance expensive, and wholesale improvements impossible.

    This stuff matters. The latest StackOverflow developer survey shows that the most common frustration is the “amount of technical debt.” It’s wasting up to eight hours a week for each developer! Number two and three are around stack complexity. Your code and architectural tech debt is slowing down your release velocity, creating attrition with your best employees, and limiting how much you can invest in new tech areas. It’s well-past time to simplify by purging architecture components that have built up (and calcified) over time. Let’s write bigger checks to pay down this debt faster.

    Explore these four areas, all focused on simplification. There are obviously tradeoffs and cost with each suggestion, but you’re not going to make meaningful progress by being timid. Note there are other dimensions to fixing tech debt besides simplification, but that’s one I see discussed the least often. I’ll use Google Cloud to offer some examples of how you might specifically tackle each, given we’re the best cloud for those making a firm shift away from legacy tech debt.

    1. Stop moving so much data around.

    If you zoom out on your architecture, how many components do you have that get data from point A to point B? I’d bet that you have lots of ETL pipelines to consolidate data into a warehouse or data lake, messaging and event processing solutions to shunt data around, and even API calls that suck data from one system into another. That’s a lot of machinery you have to create, update, and manage every day.

    Can you get rid of some of this? Can you access more of the data where it rests, versus copying it all over the place? Or use software that act on data in different ways without forcing you to migrate it for further processing? I think so.

    Let’s see some examples.

    Perform analytical queries against data sitting in different places? Google Cloud supports that with BigQuery Omni. We run BigQuery in AWS and Azure so that you can access data at rest, and not be forced to consolidate it in a single data lake. Here, I have an Excel file sitting in an Azure blob storage account. I could copy that data over to Google Cloud, but that’s more components for me to create and manage.

    Rather, I can set up a pointer to Azure from within BigQuery, and treat it like any other table. The data is processed in Azure, and only summary info travels across the wire.

    You might say “that’s cool, but I have related data in another cloud, so I’d have to move it anyway to do joins and such.” You’d think so. But we also offer cross-cloud joins with BigQuery Omni. Check this out. I’ve got that employee data in Azure, but timesheet data in Google Cloud.

    With a single SQL statement, I’m joining data across clouds. No data movement required. Less debt.

    Enrich data in analytical queries from outside databases? You might have ETL jobs in place to bring reference data into your data warehouse to supplement what’s already there. That may be unnecessary.

    With BigQuery’s Federated Queries, I can reach live into PostgreSQL, MySQL, Cloud Spanner, and even SAP Datasphere sources. Access data where it rests. Here, I’m using the EXTERNAL_QUERY function to retrieve data from a Cloud SQL database instance.

    I could use that syntax to perform joins, and do all sorts of things without ever moving data around.

    Perform complex SQL analytics against log data? Does your architecture have data copying jobs for operational data? Maybe to get it into a system where you can perform SQL queries against logs? There’s a better way.

    Google Cloud Log Analytics lets you query, view, and analyze log data without moving it anywhere.

    You can’t avoid moving data around. It’s often required. But I’m fairly sure that through smart product selection and some redesign of the architecture, you could eliminate a lot of unnecessary traffic.

    2. Compress the stack by removing duplicative components.

    Break out the chainsaw. Do you have multiple products for each software category? Or too many fine-grained categories full of best-of-breed technology? It’s time to trim.

    My former colleague Josh McKenty used to say something along the lines of “if it’s emerging buy a few, it’s a mature, no more than two.”

    You don’t need a dozen project management software products. Or more than two relational database platforms. In many cases, you can use multi-purpose services and embrace “good enough.”

    There should be a fifteen day cooling off period before you buy a specialized vector database. Just use PostgreSQL. Or, any number of existing databases that now support vector capabilities. Maybe you can even skip RAG-based solutions (and infrastructure) all together for certain use cases and just use Gemini with its long context.

    Do you have a half-dozen different event buses and stream processors? Maybe you don’t need all that? Composite services like Google Cloud Pub/Sub can be a publish/subscribe message broker, apply a log-like approach with a replay-able stream, and do push-based notifications.

    You could use Spanner Graph instead of a dedicated graph database, or Artifact Registry as a single place for OS and application packages.

    I’m keen on the new continuous queries for BigQuery where you can do stream analytics and processing as data comes into the warehouse. Enrich data, call AI models, and more. Instead of a separate service or component, it’s just part of the BigQuery engine. Turn off some stuff?

    I suspect that this one is among the hardest for folks to act upon. We often hold onto technology because it’s familiar, or even because of misplaced loyalty. But be bold. Simplify your stack by getting rid of technology that’s no longer differentiated. Make a goal of having 30% fewer software products or platforms in your architecture in 2025.

    3. Replace hyper-customized software and automation with managed services and vanilla infrastructure.

    Hear me out. You’re not that unique. There are a handful of things that your company does which are the “secret sauce” for your success, and the rest is the same as everyone else.

    More often than not, you should be fitting your team to the software, not your software to the team. I’ve personally configured and extended packaged software to a point that it was unrecognizable. For what? Because we thought our customer service intake process was SO MUCH different than anyone else’s? It wasn’t. So much tech debt happens because we want to shape technology to our existing requirements, or we want to avoid “lock-in” by committing to a vendor’s way of doing things. I think both are misguided.

    I read a lot of annual reports from public companies. I’ve never seen “we slayed at Kubernetes this year” called out. Nobody cares. A cleverly scripted, hyper-customized setup that looks like the CNCF landscape diagram is more boat anchor than accelerator. Consider switching a fully automated managed cluster in something like GKE Autopilot. Pay per pod, and get automatic upgrades, secure-by-default configurations, and a host of GKE Enterprise features to create sameness across clusters.

    Or thank-and-retire that customized or legacy workflow engine (code framework, or software product) that only four people actually understand. Use a nicely API-enabled managed product with useful control-flow actions, or a full-fledged cloud-hosted integration engine.

    You probably don’t need a customized database, caching solution, or even CI/CD stack. These are all super mature solution spaces, where whatever is provided out of the box is likely suitable for what you really need.

    4. Tone it down on the microservices and distributed systems.

    Look, I get excited about technology and want to use all the latest things. But it’s often overkill, especially in the early (or late) stages of a product.

    You simply don’t need a couple dozen serverless functions to serve a static web app. Simmer down. Or a big complex JavaScript framework when your site has a pair of pages. So much technical debt comes from over-engineering systems to use the latest patterns and technology, when the classic ones will do.

    Smash most of your serverless functions back into an “app” hosted in Cloud Run. Fewer moving parts, and all the agility you want. Use vanilla JavaScript where you can. Use small, geo-located databases until you MUST to do cross-region or global replication. Don’t build “developer platforms” and IDPs until you actually need them.

    I’m not going all DHH on you, but most folks would be better off defaulting to more monolithic systems running on a server or two. We’ve all over-distributed too many services and created unnecessarily complex architectures that are now brittle or impossible to understand. If you need the scale and resilience of distributed systems RIGHT NOW then go build one. But most of us have gotten burned from premature optimization because we assumed that our system had to handle 100x user growth overnight.

    Wrap Up

    Every company has tech debt, whether the business is 100 years old or started last week. Google has it, big banks have it, the governments have it, and YC companies have it. And “managing it” is probably a responsible thing to do. But sometimes, when you need to make a step-function improvement in how you work, incremental changes aren’t good enough. Simplify by removing the cruft, and take big cuts out of your architecture to do it!