Google Cloud Pub/Sub has three unique features that help it deliver value for app integrators

This week I’m once again speaking at INTEGRATE, a terrific Microsoft-oriented conference focused on app integration. This may be the last year I’m invited, so before I did my talk, I wanted to ensure I was up-to-speed on Google Cloud’s integration-related services. One that I was aware of, but not super familiar with, was Google Cloud Pub/Sub. My first impression was that this was a standard messaging service like Amazon SQS or Azure Service Bus. Indeed, it is a messaging service, but it does more.

Here are three unique aspects to Pub/Sub that might give you a better way to solve integration problems.

#1 – Global control plane with multi-region topics

When creating a Pub/Sub topic—an object that accepts a feed of messages—you’re only asked one major question: what’s the ID?

In the other major cloud messaging services, you select a geographic region along with the topic name. While you can, of course, interact with those messaging instances from any other region via the API, you’ll pay the latency cost. Not so with Pub/Sub. From the documentation (bolding mine):

Cloud Pub/Sub offers global data access in that publisher and subscriber clients are not aware of the location of the servers to which they connect or how those services route the data.

Pub/Sub’s load balancing mechanisms direct publisher traffic to the nearest GCP data center where data storage is allowed, as defined in the Resource Location Restriction section of the IAM & admin console. This means that publishers in multiple regions may publish messages to a single topic with low latency. Any individual message is stored in a single region. However, a topic may have messages stored in many regions. When a subscriber client requests messages published to this topic, it connects to the nearest server which aggregates data from all messages published to the topic for delivery to the client.

That’s a fascinating architecture, and it means you get terrific publisher performance from anywhere.

#2 – Supports both pull and push subscriptions for a topic

Messaging queues typically store data until it’s retrieved by the subscriber. That’s ideal for transferring messages, or work, between systems. Let’s see an example with Pub/Sub.

I first used the Google Cloud Console to create a pull-based subscription for the topic. You can see a variety of other settings (with sensible defaults) around acknowledgement deadlines, message retention, and more.

I then created a pair of .NET Core applications. One pushes messages to the topic, another pulls messages from the corresponding subscription. Google created NuGet packages for each of the major Cloud services—which is better than one mega-package that talks to all services—that you can see listed here on GitHub. Here’s the Pub/Sub package that added to both projects.

The code for the (authenticated) publisher app is straightforward, and the API was easy to understand.

using Google.Cloud.PubSub.V1;
using Google.Protobuf;

...

 public string PublishMessages()
 {
   PublisherServiceApiClient publisher = PublisherServiceApiClient.Create();
            
   //create messages
   PubsubMessage message1 = new PubsubMessage {Data = ByteString.CopyFromUtf8("Julie")};
   PubsubMessage message2 = new PubsubMessage {Data = ByteString.CopyFromUtf8("Hazel")};
   PubsubMessage message3 = new PubsubMessage {Data = ByteString.CopyFromUtf8("Frank")};

   //load into a collection
   IEnumerable<PubsubMessage> messages = new PubsubMessage[] {
      message1,
      message2,
      message3
    };
            
   //publish messages
   PublishResponse response = publisher.Publish("projects/seroter-anthos/topics/seroter-topic", messages);
            
   return "success";
 }

After I ran the publisher app, I switched to the web-based Console and saw that the subscription had three un-acknowledged messages. So, it worked.

The subscribing app? Equally straightforward. Here, I asked for up to ten messages associated with the subscription, and once I processed then, I sent “acknowledgements” back to Pub/Sub. This removes the messages from the queue so that I don’t see them again.

using Google.Cloud.PubSub.V1;
using Google.Protobuf;

...

  public IEnumerable<String> ReadMessages()
  {
     List<string> names = new List<string>();

     SubscriberServiceApiClient subscriber = SubscriberServiceApiClient.Create();
     SubscriptionName subscriptionName = new SubscriptionName("seroter-anthos", "sub1");

     PullResponse response = subscriber.Pull(subscriptionName, true, 10);
     foreach(ReceivedMessage msg in response.ReceivedMessages) {
         names.Add(msg.Message.Data.ToStringUtf8());
     }

     if(response.ReceivedMessages.Count > 0) {
         //ack the message so we don't receive it again
         subscriber.Acknowledge(subscriptionName, response.ReceivedMessages.Select(m => m.AckId));
     }

     return names;
 }

When I start up the subscriber app, it reads the three available messages in the queue. If I pull from the queue again, I get no results (as expected).

As an aside, the Google Cloud Console is really outstanding for interacting with managed services. I built .NET Core apps to test out Pub/Sub, but I could have done everything within the Console itself. I can publish messages:

And then retrieve those message, with an option to acknowledge them as well:

Great stuff.

But back to the point of this section, I can use Pub/Sub to create pull subscriptions and push subscriptions. We’ve been conditioned by cloud vendors to expect distinct services for each variation in functionality. One example is with messaging services, where you see unique services for queuing, event streaming, notifications, and more. Here with Pub/Sub, I’m getting a notification service and queuing service together. A “push” subscription doesn’t wait for the subscriber to request work; it pushes the message to the designated (optionally, authenticated) endpoint. You might provide the URL of an application webhook, an API, a function, or whatever should respond immediately to your message.

I like this capability, and it simplifies your architecture.

#3 – Supports message replay within an existing subscription and for new subscriptions

One of the things I’ve found most attractive about event processing engines is the durability and replay-ability functionality. Unlike a traditional message queue, an event processor is based on a durable log where you can rewind and pull data from any point. That’s cool. Your event streaming engine isn’t a database or system of record, but a useful snapshot in time of an event stream. What if you could get the queuing semantics you want, with that durability you like from event streaming? Pub/Sub does that.

This again harkens back to point #2, where Pub/Sub absorbs functionality from other specialized services. Let me show you what I found.

When creating a subscription (or editing an existing one), you have the option to retain acknowledged messages. This keeps these messages around for whatever the duration is for the subscription (up to seven days).

To try this out, I sent in four messages to the topic, with bodies of “test1”, “test2”, “test3”, and “test4.” I then viewed, and acknowledged, all of them.

If I do another “pull” there are no more messages. This is standard queuing behavior. An empty queue is a happy queue. But what if something went wrong downstream? You’d typically have to go back upstream and resubmit the message. Because I saved acknowledged messages, I can use the “seek” functionality to replay!

Hey now. That’s pretty wicked. When I pull from the subscription again, any message after the date/time specified shows up again.

And I get unlimited bites at this apple. I can choose to replay again, and again, and again. You can imagine all sorts of scenarios where this sort of protection can come in handy.

Ok, but what about new subscribers? What about a system that comes online and wants a batch of messages that went through the system yesterday? This is where snapshots are powerful. They store the state of any unacknowledged messages in the subscription, and any new messages published after the snapshot was taken. To demonstrate this, I sent in three more messages, with bodies of “test 5”, “test6” and “test7.” Then I took a snapshot on the subscription.

I read all the new messages from the subscription, and acknowledged them. Within this Pub/Sub subscription, I chose to “replay”, load the snapshot, and saw these messages again. That could be useful if I took a snapshot pre-deploy of code changes, something went wrong, and I wanted to process everything from that snapshot. But what if I want access to this past data from another subscription?

I created a new subscription called “sub3.” This might represent a new system that just came online, or even a tap that wants to analyze the last four days of data. Initially, this subscription has no associated messages. That makes sense; it only sees messages that arrived after the subscription was created. From this additional subscription, I chose to “replay” and selected the existing snapshot.

After that, I went to my subscription to view messages, and I saw the three messages from the other subscription’s snapshot.

Wow, that’s powerful. New subscribers don’t always have to start from scratch thanks to this feature.

It might be worth it for you to take an extended look at Google Cloud Pub/Sub. It’s got many of the features you expect from a scaled cloud service, with a few extra features that may delight you.

Author: Richard Seroter

Richard Seroter is currently the Chief Evangelist at Google Cloud and leads the Developer Relations program. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Chief Evangelist at Google Cloud, Richard leads the team of developer advocates, developer engineers, outbound product managers, and technical writers who ensure that people find, use, and enjoy Google Cloud. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.