Architects examination of form and function the dimensional model

23 274 0
Architects examination of form and function the dimensional model

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

A detailed assessment and evaluation of data warehouse system functionality and how it applies to the dimensional data model using tools that the architect works with. A detailed assessment and evaluation of data warehouse system functionality and how it applies to the dimensional data model using tools that the architect works with. A detailed assessment and evaluation of data warehouse system functionality and how it applies to the dimensional data model using tools that the architect works with.

White Paper An Architect‘s Evaluation of Form and Function– the Dimensional Data Model Donavon Gooldy, Senior Principal Tuesday, May 27, 2014 Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 2 Table of contents 1 Introduction 3 2 Model Characteristics 4 3 Dimensional Model Architectural Origins 5 3.1 The Entity Relationship Model Form 5 3.2 An Organized Performance Architecture Response 6 4 The Dimension Model Form 8 5 The Dimensional Model Function 10 6 The Limits of Single Form Design 11 6.1 Function Limiting Characteristics the Dimensional Form 11 6.1.1 The Dimensional Form Does Not Extend Well 11 6.1.2 The Dimensional Form Is Not Flexible 12 6.1.3 The Form Does Not Describe the Business 13 7 Applying the Dimensional Form without Requirements 15 7.1 Client A 15 7.2 Client B 15 7.3 Common Characteristics 16 7.4 Bottom-Up Warehouse Design 19 8 System Architecture Form to Fulfill Multiple Functions 21 8.1 Combining Model Forms 21 8.2 Integrating Model Form with Technology Form 22 9 Conclusion 23 Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 3 1 Introduction "It is the pervading law of all things organic and inorganic, of all things physical and metaphysical, of all things human and all things superhuman, of all true manifestations of the head, of the heart, of the soul, that the life is recognizable in its expression, that form ever follows function. This is the law." Louis Sullivan “Form follows function - that has been misunderstood. Form and function should be one, joined in a spiritual union.” Frank Lloyd Wright To be an architect of information solutions is to understand the concept of form following function intuitively, as a matter of nature, because design (creation of form) is about enabling informational function. Taking the title ―architect‖ affirms one‘s conscious method design based decision process in terms of aligning form with functional needs. As one examines form‘s relationship to function within the dimensional model, the evaluation of the model form must not be based solely on Sullivan‘s statement, but on Wright‘s; form not only follows function, but function follows form. The concept of form and function unity highlights that form is not only based on function, but also limits it, many times strictly. Form and function are bound together in a cause and effect relationship; function is the cause of the form, while form both facilitates function and limits it. When considering the data warehouse function, one considers the overall goal to delivery information, allowing the business to measure its activity and understand the impacts of its actions in the market place. This high-level statement of function though, is far too general for the evaluation of model form. As will be demonstrated, a more detailed understanding of system functionality is needed before determining model form application. The function-limiting impact of form is often overlooked in design, particularly data model design. By implementing a specific design form, are the broader limits on function considered? What system design steps are needed to mitigate those limitations? Too often data practitioners apply the form they know best, the latest form they‘ve come to appreciate or a form that is deemed a ―best practice‖ in their circles. True architects are not practitioners of ―best practices‖. They practice the application of forms to function based on principles derived from cause and effect analysis. The architect studies the relationship of form and function, of cause and effect and then applies forms specific to the required functions. The architect deals with the complexity of the client‘s multi-functional needs and devises multi-component solution forms to deliver functionality incapable of being delivered in single form solutions. Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 4 2 Model Characteristics One generally thinks about a model form in terms of certain characteristics. Through the evaluation these characteristics and examination of model form, it becomes evident how they align with, support and limit function in relationship to data and information delivery. o The model‘s ability to extend  to extend a data model for new content/capability without disruption and redesign of processes o The model‘s ability to be flexible  to support multiple purposes or functions o The model‘s ability to describe the business and subjects within the corporate structure  to document the business using data o The model‘s ability to support any valid business question  to answer business questions without specific design structuring  not a matter of ease or performance but a matter of ability o The model‘s ability to efficiently and quickly answer business questions (report query performance)  to provide acceptable query performance for corporate decision support and analysis o The model‘s ability to demonstrate business performance  to measure business performance The critical examination of limiting aspects to the dimensional model gives the architect the foundational principles necessary to understand the application of dimensional form in Information Architecture solutions. Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 5 3 Dimensional Model Architectural Origins The dimensional model form is designed to greatly simplify database optimization for queries that would otherwise be applied against an Entity Relationship (ER) model. Because the dimensional model is a design response used to overcome ER form limits, there must first be examination of the ER form and its characteristics as a comparison basis. 3.1 The Entity Relationship Model Form 1. To free the collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users; 4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. — E.F. Codd, "Further Normalization of the Data Base Relational Model" Each of Codd‘s goals not only provides insight to ER model function, but are also instructive as to the reasons for the dimensional model form. The Data Architect produces an ER model that describes the business through ―Entities‖ representing each of the objects, actors, organizational fictions, contracts, business activities and others in the business landscape. If it can be named as a subject, it must be represented as an entity within the model. Each entity is given an identifier known as the primary key. Additional attributes are added to describe only the primary key. Foreign key relationships document each business relationship existing between entities. These relationships are instilled in the model logically rather than by direct data association. This distinction is fundamental to the examination of the ER and Dimensional Model form characteristics and its ability to deliver specific functionality. This examination won‘t delve into the application of normalization rules, except to state that many modelers deal with normalization intuitively as a matter of entity definition and evaluation of attribute when creating the ER model. Normalization rules represent a method of thinking regarding the evaluation of data content in model development. Normalization ensures all entities are defined purely and that all business relationships within the model are defined logically rather than by physical association. Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 6 As one examines Codd‘s goals it is obvious that they align with some of the model characteristics previously discussed. Those characteristics are:  extensibility  flexibility  ability to describe the subject  ability to support any valid business question Cobb‘s fourth goal may appear somewhat cryptic, but is central to an architect‘s understanding of both model forms and support of Codd‘s preceding goals. In a fully normalized model there is no statistical data relationship bias that emphasizes one relationship or eliminates another, because relationships are implemented logically. Data that is not normalized, associates data physically on the same row, creating a bias. When data is organized this way, certain questions can be answered, while others cannot. Applying rules of normalization ensures no bias exists for one type of business question or another. One can ask any valid business question of a normalized model. Based on the model‘s logically implemented relationships, (foreign key) one will always get the answer. There is no need to know future questions. It will always work if each entity is represented within the model that is germane to the question and each relationship between the entities documented logically. As long as one is willing to write the necessary queries and wait, the model will answer. Therefore, the normalized entity relationship model form is designed for flexibility, to answer any business question. It eliminates relationship bias by describing each entity purely and documenting all business relationship logically, providing data relationship neutrality. Extensibility is another outcome of eliminating relational bias, as will be seen later. The normalized form that gives us this functionality also limits function. To answer more than simple business questions, complex queries need to be written with many joins that follow relational paths, and identifying specific content within data sets using correlated sub-queries. The query may need to do mixed aggregation to common group by levels as well as use outer joins complicating query optimization. Temp tables and multiple query steps may need to be used in some cases. In data warehousing, all of this complex query optimization results in issues of access and join serialization in relationship to lots of I/O from large data reading, buffering and sorting. No one wants to wait hours for BI report results. In the early days of data warehousing, on at least one RDMBS, the longer the query ran, the more likely it would end in error due to the database‘s concurrency architecture. 3.2 An Organized Performance Architecture Response At the time of Ralph Kimball‘s first edition release of The Data Warehouse Toolkit, most data warehouse servers were hosted on SMP database servers. These types of servers do not scale Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 7 parallel processing linearly as MPP clusters do, and often led to a variety of very limiting data forms that were intended to improve query performance. The introduction of the dimensional model provided an organized, systematic design basis for a performance architecture form leading to predictable query optimization. It also addressed another issue at the time; it‘s much simpler to write queries against. Hand coding queries against an ER model for any sort of complicated reporting requires a good deal of skill, experience and time. While users still need to write manual queries, Business Intelligence software has diminished that by supporting metadata driven abstraction that interprets the physical data model for the user. When dimensional models are designed properly for reporting they require only selection of attributes and measure required, direct join to dimensions needed, application of WHERE or JOIN filters, appropriate aggregate functions and GROUP BY clauses (and perhaps a HAVING clause.) Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 8 4 The Dimension Model Form Dimensional modeling achieves its performance advantage by designing denormalizations into data organizations specific to answering a limited range of business questions. These denormalizations take the form of placing data in physical relationships and eliminating the logical business-based relationships that follow an entity-to-entity-to-entity form, in favor of more direct report grouping reference relationship to business metrics. In other words, the dimensional model form creates explicit relationship biases to simplify queries, reduce I/O and eliminate query optimization complexity, which delivers answers to business questions efficiently and quickly. The pattern of denormalization follows the form of a central table called a fact table containing one or more business measurements called facts. The facts may be sourced from a variety of transactional and reference sources, all of which may be used in combination to answer certain classes of business questions. The fact table row always has the context of a time period, either date or time together. The time period may be either date or higher level time period, such as week, month, quarter or year. Facts maybe transactional, a point-in-time snapshot state of metrics or period-based aggregate. The fact table also has foreign key relationship attributes relating the fact rows to reference tables called dimensions. Dimensions may represent a single entity identity of data, but typically contain attributes from, or derived from, multiple entities describing a subject. Typically there is at least one dimension associated with the fact table that has at its basis in on an entity with a natural business-based relationship to the business activity represented in facts of the fact table. There are usually other dimension relationships that are one or two entities removed from the business activity documented in the fact table. There may also be additional dimensions related to the facts that must be derived by processing other business activity. Keep in mind that if a source does not actually document all of the data relationships, for example the customer‘s origination sales channels, then these relationships must be derived from processing business activity records, such as sales or service orders. One must also build into the process and structure of the star schema all of the complex processing that would be needed in against the entity relationship model to bring data up to common simplified form, fit to answering functionally similar business questions. The philosophy of the dimensional model is to do all of processing once to form a common basis for a class of business questions or analysis, storing the results of that process in the star schema so that BI queries avoid that complex process at report runtime. It is a ‗process once, use it many times‘ approach. The end result should be a star schema capable of delivering measurements based on simple SELECT, JOIN, WHERE and GROUP BY statements. Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 9 Proprietary and Confidential - ©2014 Clarity Solution Group, Inc. 10 5 The Dimensional Model Function One concludes that the dimensional form is a performance architecture intended to improve report query performance. However so far, a full understanding of why dimensional models perform so well and what limits them has yet to be exposed. The star schema design is created to measure business. It is created with a business function orientation, as opposed to the subject area orientation of the ER model. The form is one of centralization of a series of measures (facts) surrounded by attributes gives business context to those measurements. While some consumers may refer to the content as subjects, the real orientation is focused on business reporting and analysis. It may be Sales Analysis or Risk Analysis, but these are organized to support specific business functions and not provide general data as a subject. Instead of presenting data as it exists in an ER model, or in the source, data is organized to make decisions. Some of Webster‘s definitions of the word ―Information‖ are: 1. ―knowledge obtained from investigation, study, or instruction‖ 2. ―INTELLIGENCE, NEWS‖ 3. ―FACTS, DATA‖ Architects do not design dimensional models that deliver measurements (facts) randomly as data. The purpose is to deliver organized information to the business clients that supports the client‘s business decision making function. To be ―information,‖ measures have to be organized and presented with functional context; without that, it is simply data. Providing data is what an ER model does. It delivers it without bias. It‘s up the consumer to discern how to make it provide information. In a dimensional model, much of that work of organizing data as information is performed in advance of the report execution. Therefore, a primary function for which the dimensional form is employed is that of a performance architecture built upon the direct structuring of information for specific business function. It is important to make this distinction because there are other means of implementing performance architectures for delivering information that do not rely on data denormalizations in a database. And, this is not to say that dimensional model content is the final state of the information organization. In systems that employ the dimensional form, it represents the foundational state of information that is further organized into reporting to deliver KPIs, comparisons, trends, graphics and other business oriented presentations of information [...]... on of the relationship that model form and function have with one another, the same principle of form s support for function and form s limiting effect on function has broad applications in all architectural applications and disciplines, whether it is the evaluation of model form, technology form or even methodology To be an architect is to be a student of form and function and apply form based on these... examination, but also a wide variety of technology based design forms as well 6.1 Function Limiting Characteristics the Dimensional Form The dimensional model is a powerful performance architecture form for the delivery of information to businesses when properly applied Like the ER form, the dimensional form has limitations in its recognized function 6.1.1 The Dimensional Form Does Not Extend Well Ability... eliminate the business rule based relationships between them, create a ―fact‖ table, and associate all the entity based dimension with one another by way of the fact, thus obscuring an understanding of the natural relationship of the entities to one another By creating a data bias not informed by requirements, the modeler has no idea whether the bias will actually be useful to the function for the star The. .. not just variations of old ones, will likely require further star schema development 6.1.3 The Form Does Not Describe the Business The fact that the dimensional form does not describe subject content is at the heart of the forms inflexibility Yes, the form has attribution describing certain dimensions, but one cannot look at a dimensional model and derive an understanding of how the business works in... description, the denormalizations of data and relationships eliminates the ability to understand the basis of the denormalizations and how they were applied in the first place Only a reporting relationship can be determined rather than a business data relationship There may be a model in the architect‘s hear as to how the entities represented in the dimension are actually related, but there is nothing in the dimensional. .. delivering industry-based ER models They use the power of the database engine to side step much of the need for delivering an information organized performance architecture that the dimensional model form represents Instead, the implementations virtualize as much of the physical information organization as possible with database views and create lightly processed materializations of information organization... which defines the function In both of these examples the model practitioners did not understand that the dimensional form only gains its performance architecture status when it is functionally aligned to deliver actual reporting and analysis capabilities (usable information) Proprietary and Confidential - ©2014 Clarity Solution Group, Inc 16 The goal of most modelers who attempt to model dimensionally... based on these principles of form s effects The architect‘s role is to recognize the clients‘ needs and apply form based on those needs There is always a balancing of the application of form, usually constrained by the client‘s tolerances for cost The architect has to ensure the client fully understands the impact of compromises in form s application The architect that can engage the client in a fact-based... to happen and the dimension useless for reporting purposes Proprietary and Confidential - ©2014 Clarity Solution Group, Inc 18 The consequence of misapplying the dimensional form is the creation of a good deal of unnecessary dysfunction for the client later when they actually need to use the warehouse for business These implementations occur whenever the modeler has to provision data in the warehouse... artificial dimensional form applied to the warehouse, guided not by reporting or analytics requirements, but by a modeler‘s imagination of how data could ―made dimensional or how the client ―might‖ use the data Since it has been established that the dimensional form needs to be aligned for business function, the use of the term ―artificial dimensional form is warranted anytime dimensional model design

Ngày đăng: 03/07/2014, 08:17

Tài liệu cùng người dùng

Tài liệu liên quan