My company has an increasingly mature Operational Data Store (ODS) strategy where agreed-upon enterprise entities are added to shared repositories and accessible across the organization. These repositories are typically populated via batch loads from the source system. The ODS is used to keep systems which are dependent on the source system from bombarding the source with bulk data load processing requests. Independent of our ODS landscape, we have BizTalk Server doing fan-outs of real-time data from source systems to subscribers who can handle real-time data events. My smart architect buddy Ian is proposing a change to our model, and I thought I’d see what you all think.
So this is a summary of what our landscape looks like today:
Notice that in this case, our ESB (BizTalk Server) is populated independently of our ODS. What Ian wants us to do is populate all of our ODSs from our ESB (thus making it just another real-time subscriber), and then throw a “get” interface on the ODS for request/reply operations which would have previously gone against the source system (see below). Below notice that the second subscriber of data receives real-time feeds from BizTalk, but also can query the ODS for data as well.
I guess I’ve typically thought of an ODS as being for batch interactions only, and not something that should be queried via request/response operations. If I need real-time data, I’d always go to the source itself. However, there are lots of benefits to this proposed model:
- Decouple clients from source system
- Remove any downstream impact on source system maintenance schedules
- Don’t require additional load, interfaces or adapters on source system (although most enterprise systems should be able to handle the incremental load of request/response queries).
- All the returned entities are already in a flattened, enterprise format as opposed to how they are represented in the source system
Now, before declaring that we never go to source systems and always hit an ODS (which would be insane), here are some considerations we’ve thought about:
- This should only be for shared enterprise entities that are distributed around the organization.
- There is only a “get by ID” operation on the ODS vs. any sort of “update” or “delete” operations. Clearly having any sort of “change” operations would be nuts and cause a data consistency nightmare.
- The availability/reliability of the platform hosting the ODS must meet or exceed that of the source system.
- We must be assured that the source system can publish a real-time event/data message. No “quasi-real-time” where updates are pushed out every hour.
- This model should not be used for entities where we need to be 110% sure that the data is current (e.g. financial data, extremely volatile data).
Now, there still may be a reliance by clients on the source system if the shared entity doesn’t contain every property required by the client. And in a truly event-driven model, maybe the non-ODS subscribers should only get event notifications and be expected to ping the ODS (which receives the full data message) if they want more data than what exists in the event message. But other than that, are there other things I’ve missed or considerations I should weigh more heavily one way or the other? Share with me.
Technorati Tags: BizTalk, SOA, architecture
We are implementing a similar approach righ now. We have publishers (making information available) and subscribers (getting information). The publishers get the data (mediation) from the systems, apply cleansing rules and store the data in the ODS. Once the data is stored, an event (simple xml message) is placed on the bus (message box). The subscribers wait for one or more events that they’re interested in and then will use ‘services’ (see below for our definition of SOA) to get the data and feed their systems.
In our mind SOA is not web services, so for us we have some web services (SOAP and/or REST) were it makes sense. But in many cases we have SQL services. They are just views and/or sp abstracting access to the ODS and providing a contract (flat view) that is governed and supported. It seems to be working so far but we do not have real time needs in the pure sense. But we support intra-day transactions without problems. There might be a where we’ll need to pass data on the bus and not only events, but for now we don’t.
The implementation is still new (couple of months) and we are still learing stuff (we’re using OWL to model our data).