6 Things to Know About Microsoft StreamInsight

Microsoft StreamInsight is a new product included with SQL Server 2008 R2.  It is Microsoft’s first foray into the event stream processing and complex event processing market that already has its share of mature products and thought leaders.  I’ve spent a reasonable amount of time with the product over the past 8 months and thought I’d try and give you a quick look at the things you should know about it.

  1. Event processing is about continuous intelligence.  An event can be all sorts of things ranging from a customer’s change of address to a meter read on an electrical meter.  When you have an event driven architecture, you’re dealing with asynchronous communication of data as it happens to consumers who can choose how to act upon it.  The term “complex event processing” refers to gathering knowledge from multiple (simple) business events into smaller sets of summary events.  I can join data from multiple streams and detect event patterns that may have not been visible without the collective intelligence. Unlike traditional database driven applications where you constantly submit queries against a standing set of data, an event processing solution deploys a set of compiled queries that the event data passes through.  This is a paradigm shift for many, and can be tricky to get your head around, but it’s a compelling way to compliment an enterprise business intelligence strategy and improve the availability of information to those who need it.
  2. Queries are written using LINQ.  The StreamInsight team chose LINQ as their mechanism for authoring declarative queries.  As you would hope, you can write a fairly wide set of queries that filter content, join distinct streams, perform calculations and much more.  What if I wanted to have my customer call center send out a quick event whenever a particular product was named in a customer complaint?  My query can filter out all the other products that get mentioned and amplify events about the target product:
    var filterQuery =
          from e in callCenterInputStream
          where e.Product == "Seroterum" select e;

    One huge aspect of StreamInsight queries relates to aggregation.  Individual event calculation and filtering is cool, but what if we want to know what is happening over a period of time?  This is where windows come into play.  If I want to perform a count, average, or summation of events, I need to specify a particular time window that I’m interested in.  For instance, let’s say that I wanted to know the most popular pages on a website over the past fifteen minutes, and wanted to recalculate that total every minute.  So every minute, calculate the count of hits per page over the past fifteen minutes.  This is called a Hopping Window. 

    var activeSessions = from w in websiteInputStream
                                group w by w.PageName into pageGroup
                                from x in pageGroup.HoppingWindow(
                                select new PageSummarySummary
                                    PageName = pageGroup.Key,
                                   TotalRequests = x.Count()

    I’ll have more on this topic in a subsequent blog post but for now, know that there are additional windows available in StreamInsight and I HIGHLY recommend reading this great new paper on the topic from the StreamInsight team.

  3. Queries can be reused and chained.  A very nice aspect of an event processing solution is the ability to link together queries.  Consider a scenario where the first query takes thousands of events per second and filters out the noise and leaves me only with a subset of events that I care about.  I can use the output of that query in another query which performs additional calculations or aggregation against this more targeted event stream.  Or, consider a “pub/sub” scenario where I receive a stream of events from one source but have multiple output targets.  I can take the results from one stream and leverage it in many others.
  4. StreamInsight uses an adapter model for the input and output of data.  When you build up a StreamInsight solution, you end up creating or leveraging adapters.  The product doesn’t come with any production-level adapters yet, but fortunately there are a decent number of best-practice samples available.  In my upcoming book I show you how to build an MSMQ adapter which takes data from a queue and feeds it into the StreamInsight engine.  Adapters can be written in a generic, untyped fashion and therefore support easy reuse, or, they can be written to expect a particular event payload.  As you’d expect, it’s easier to write a specific adapter, but there are obviously long term benefits to building reusable, generic adapters.
  5. There are multiple hosting options.  If you choose, you can create an in-process StreamInsight server which hosts queries and uses adapters to connect to data publishers and consumers.  This is probably the easiest option to build, and you get the most control over the engine.  There is also an option to use a central StreamInsight server which installs as a Windows Service on a machine.  Whereas the first option leverages a “Server.Create()” operation, the latter option uses a “Server.Connect()” manner for working with the Engine.  I’m writing a follow up post shortly on how to leverage the remote server option, so stay tuned.  For now, just know that you have choices for hosting.
  6. Debugging in StreamInsight is good, but overall administration is immature.   The product ships with a fairly interesting debugging tool which also acts as the only graphical UI for doing rudimentary management of a server.  For instance, when you connect to a server (in process or hosted) you can see the “applications” and queries you’ve deployed.
    When a query is running, you can choose to record the activities, and then play back the stream.  This is great for seeing how your query was processed across the various LINQ operations (e.g. joins, counts). 
    Also baked into the Debugger are some nice root cause analysis capabilities and tracing of an event through the query steps.  You also get a fair amount of server-wide diagnostics about the engine and queries.  However, there are no other graphical tools for administering the server.  You’ll find yourself writing code or using PowerShell to perform other administrative tasks.  I expect this to be an area where you see a mix of community tools and product group samples fill the void until future releases produce a more robust administration interface.

That’s StreamInsight in a nutshell.  If you want to learn more, I’ve written a chapter about StreamInsight in my upcoming book, and also maintain a StreamInsight Resources page on the book’s website.

Author: Richard Seroter

Richard Seroter is Director of Developer Relations and Outbound Product Management at Google Cloud. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Director of Developer Relations and Outbound Product Management, Richard leads an organization of Google Cloud developer advocates, engineers, platform builders, and outbound product managers that help customers find success in their cloud journey. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.

7 thoughts

  1. Nice quick article. I’m really interested to see how this product pans out. I think it has some fantastic opportunities in the BI/DW world. I would have to imagine in the not-to-distant future the days of batch processing erp transactions through ssis into a DW will be gone.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.