- Bill Inmon
- DM Review Magazine, July 1998
Recently there has been controversy over the validity and makeup of the architectural structure known as the operational data store (ODS). Some skeptics question the existence of the ODS. This argument is quite strange because the ODS is one of the most pervasive architectural structures found in information systems today. The notion that the ODS is not a legitimate structure is news to SAP, Oracle Financials and PeopleSoft–three of the most widely implemented pieces of software in the 1990s which happen to contain major components which are decidedly ODSs. While some aspects of these software packages are beyond the bounds of an ODS, many of the features of these software packages squarely fit the paradigm of an ODS.As further evidence of the health of the ODS, in a recent private conference, seven information systems directors of large, well-known companies spent time describing their environment. The ODS was a prominent feature of each of these companies’ information systems architecture. So it is peculiar that industry experts are questioning the validity of the ODS. Perhaps these experts simply do not understand what an ODS is and what functions it performs.
The Architectural Positioning
In order to have a discussion about ODSs, the conversation best begins with a schematic that shows how an ODS is architecturally positioned. Figure 1 shows the classical positioning of the ODS.
In Figure 1 the ODS is seen to be an architectural structure that is fed by integration and transformation (i/t) programs. These i/t programs can be the same programs as the ones that feed the data warehouse or they can be separate programs. The ODS, in turn, feeds data to the data warehouse.
Some operational data traverses directly into the data warehouse through the i/t layer while other operational data passes from the operational foundation into the i/t layer, then into the ODS and on into the data warehouse.
An ODS is an integrated, subject- oriented, volatile (including update), current-valued structure designed to serve operational users as they do high performance integrated processing. (Note: For a comprehensive discussion of the subject of operational data stores, refer to the book, Building the Operational Data Store, by W. H. Inmon, Claudia Imhoff and Greg Battas, published by John Wiley & Sons. This article will not try to restate concepts and descriptions that have been in the public domain for quite a while.)
The essence of an ODS is the enablement of integrated, collective on-line processing. An ODS delivers consistent high transaction performance–two to three seconds. An ODS supports on-line update. An ODS is integrated across many applications. An ODS provides a foundation for collective, up-to- the-second views of the enterprise. And, at the same time, the ODS supports decision support processing.
Because of the many roles that an ODS fulfills, it is a complex structure. Its underlying technology is complex. Its design is complex. Monitoring and maintaining the ODS is complex.
The ODS takes a long time to implement (e.g., SAP). The ODS requires changing or replacing old legacy systems that are unintegrated.
The Dual Role of the ODS
There is a very dual role played by the ODS. On the one hand, the ODS is decidedly operational. The ODS provides high response time and high availability and is certainly qualified to act as the basis of mission-critical systems. On the other hand, the ODS has some very clear DSS characteristics. The ODS is integrated, subject oriented and supports some important kinds of decision support.
The Users–Farmers and Explorers
This article will focus on one of the more misunderstood aspects of the ODS–the foundation of the design. In order to understand the foundation of the design of the ODS, you first need to understand that two very different types of users are attracted to the ODS–farmers and explorers.
The first user of the ODS is a user who can be called a “farmer.” Farmers are those people who do the same task repetitively. Farmers know what they want when they set out to search for something. Farmers look at small amounts of data with each transaction. Farmers almost always find what they want. Farmers usually find small flakes of gold, not huge nuggets, at the completion of their transaction. Farmers operate in a world of structure–structured data, structured processing, structured procedures and so forth.
The other type of user that is served by the ODS is the quot;explorer.” The explorer is the antithesis of the farmer. The explorer operates in a random manner. The explorer does not know what he/she is looking for at the outset of the analysis. Explorers operate in a heuristic mode. Explorers look at very large sets of data. Explorers look for associations between types of data, patterns that are useful and relationships that have heretofore never been discovered. The explorer often finds nothing as a result of an analysis, but occasionally the explorer finds huge nuggets of gold. Explorers operate in a pattern that defies prediction. The explorer operates in an almost completely unstructured manner.
The ODS and Explorers and Farmers
The ODS must satisfy the needs of both the farmer and the explorer; and because of this paradox, the design of the ODS is a difficult task in the best of circumstances.
The Basis of Design in DSS
The classical design of the structures found in the DSS environment begins with a data model, which reflects the informational needs of the corporation. Figure 2 shows the steps leading to a DSS design.
Normalized tables are generated from the data model. These tables constitute what can be described as a logical design. The many normalized tables are combined into a form of physical design that can be described as lightly normalized design. In a lightly normalized design, tables are combined on the basis of containing common keys and general common usage.
The design technique of creating normalized/lightly normalized structures based on a data model that has been described here fits many instances of DSS design. But there is a fly in the ointment of this approach. When the issues of performance where many tables must be joined, performance where there are many occurrences of data that will populate the design, and simplicity where users find it unnatural to join many tables together to represent data in a form comprehensible to the end user each time the end user does a transaction are considered, the design technique of light normalization yields marginal results.
An alternate design approach is to take into consideration the volume and usage of the data. When the volume and usage of the data are factored into the design, a mutant form of normalization is achieved. The light normalization turns into heavy normalization, and a structure known as the “star join” is created. (See Figure 3.)
There are two essential parts to a star join– fact tables and dimension tables. (Note: For an in-depth discussion of the subject of multidimensional design, refer to Ralph Kimball’s book, The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, published by John Wiley & Sons. This book is the definitive source for the subject of multidimensional database design.) The fact table represents the structure that holds the majority of the occurrences of the data. Fact tables typically combine data and cross reference keys from a variety of other tables.
The other type of table that participates in a star join is the dimension table. Dimension tables contain data which is not terribly voluminous. Dimension tables are related to fact tables by means of a foreign key relationship.
Fact tables are efficient to access because data has been prejoined into the table at the moment of loading. The end user is able to access fact tables efficiently because the fact tables are extremely streamlined in their design. In addition, the fact table is familiar to the end user, in terms of the day-to- day structuring of data that the end user is accustomed to seeing.
By building star joins, the designer has created a structure for efficient access, large volumes of data and natural end-user viewing. However, there is a problem with star joins. In order to know how to create the star join, the designer must make assumptions about the usage of the data. Stated differently, without knowing the predominant pattern of access and usage of the data, you cannot create a star join. At the heart of the design of any star join is the implicit understanding of how the data in the star join is to be used. Unfortunately, one department will look at data very differently from another department. The star join for finance will be very different than the star join for production, for example.
There is a second problem with star join structures, and that problem is that on-line update plays havoc with the underlying data management required to make the star join complete. In a DSS world where there is no update, this is not a problem. But in an ODS world where on-line update is a normal event, the inability of the star join to gracefully handle updates presents a special challenge.
Thus, the ODS designer has a dilemma. On the one hand, the designer wishes to have efficiency of access and the ability to handle large amounts of data. On the other hand, the ODS designer must design the system to be able to accommodate a wide variety of users. The following table illustrates the dilemma of the ODS database designer:
The designer in the ODS environment faces Hobson’s choice. Neither design approach–normalized or star join–is optimal for the ODS. Both approaches have their strengths and weaknesses.
The way the sophisticated designer goes about solving this apparent contradiction is to go back to the users of the system. For those parts of the system used primarily by explorers, a normalized design is optimal. Explorers do not know how they are going to use the system, so normalization suits them just fine. For those parts of the system used primarily by farmers, a star join approach is optimal. Since farmers have a predictable and repetitive usage pattern, a star join can be created to allow them optimal access. Figure 4 shows this dual design approach for the ODS.
The next factor that must be accounted for is the issue of update or pure DSS processing. Some farmers do no update. They are the “pure” DSS processors. Other farmers do update as a regular part of their ODS processing.
Explorers, however, seldom do on-line update. If explorers do update at all, it is by creating sweeping batch programs that march across entire tables and make massive changes. But explorers are not known for making changes, certainly not on-line updates. Figure 5 shows that the proper basis of design for an ODS is entirely dependent on who is using the ODS and what kind of work they are doing.
If the ODS is used only by farmers doing DSS processing, then an exclusive star join approach is in order for the entire ODS. But if update processing is being done by farmers or if there is usage of the ODS by explorers to any extent, then one or the other form of normalization is in order. If the ODS is used only by explorers, then a normalized approach is in order for the entire ODS.
This article has addressed the architectural structure of an ODS and how it is architecturally positioned. The ODS has a dual design objective, which is quite different from other database structures found in the world of DSS and operational systems.
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 35 years of database technology management experience and data warehouse design expertise. His books have been translated into nine languages. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for many major computing associations. For more information, visit www.inmongif.com and www.inmoncif.com. Inmon may be reached at (303) 681-6772.
For more information on related topics, visit the following channels: