Hướng dẫn học Microsoft SQL Server 2008 part 9 pptx

10 427 0
Hướng dẫn học Microsoft SQL Server 2008 part 9 pptx

Đang tải... (xem toàn văn)

Thông tin tài liệu

Nielsen c02.tex V4 - 07/21/2009 12:02pm Page 42 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 43 Relational Database Design IN THIS CHAPTER Introducing entities, tuples, and attributes Conceptual diagramming vs. SQL DDL Avoiding normalization over-complexity Choosing the right database design pattern Ensuring data integrity Exploring alternative patterns Normal forms I play jazz guitar — well, I used to play before life became so busy. (You can listen to some of my MP3s on my ‘‘about’’ page on www.sqlserverbible.com.) There are some musicians who can hear a song and then play it; I’m not one of those. I can feel the rhythm, but I have to work through the chords and figure them out almost mathematically before I can play anything but a simple piece. To me, building chords and chord progressions is like drawing geometric patterns on the guitar neck using the frets and strings. Music theory encompasses the scales, chords, and progressions used to make music. Every melody, harmony, rhythm, and song draws from music theory. For some musicians there’s just a feeling that the song sounds right. For those who make music their profession, they understand the theory behind why a song feels right. Great musicians have both the feel and the theory in their music. Designing databases is similar to playing music. Databases are designed by combining the right patterns to correctly model a specific solution to a problem. Normalization is the theory that shapes the design. There’s both the mathematic theory of relational algebra and the intuitive feel of an elegant database. Designing databases is both science and art. Database Basics The purpose of a database is to store the information required by an organization. Any means of collecting and organizing data is a database. Prior to the Informa- tion Age, information was primarily stored on cards, in file folders, or in ledger books. Before the adding machine, offices employed dozens of workers who spent all day adding columns of numbers and double-checking the math of others. The job title of those who had that exciting career was computer. 43 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 44 Part I Laying the Foundation Author’s Note W elcome to the second of five chapters that deal with database design. Although they’re spread out in the table of contents, they weave a consistent theme that good design yields great performance: ■ Chapter 2, ‘‘Data Architecture,’’ provides an overview of data architecture. ■ This chapter details relational database theory. ■ Chapter 20, ‘‘Creating the Physical Database Schema,’’ discusses the DDL layer of database design and development. ■ Partitioning the physical layer is covered in Chapter 68, ‘‘Partitioning.’’ ■ Designing data warehouses for business intelligence is covered in Chapter 70, ‘‘BI Design.’’ There’s more to this chapter than the standard ‘‘Intro to Normalization.’’ This chapter draws on the lessons I’ve learned over the years and has a few original ideas. This chapter covers a book’s worth of material (which is why I rewrote it three times), but I tried to concisely summarize the main ideas. The chapter opens with an introduction to database design term and concepts. Then I present the same concept from three perspectives: first with the common patterns, then with my custom Layered Design concept, and lastly with the normal forms. I’ve tried to make the chapter flow, but each of these ideas is easier to comprehend after you understand the other two, so if you have the time, read the chapter twice to get the most out of it. As the number crunching began to be handled by digital machines, human labor, rather than being eliminated, shifted to other tasks. Analysts, programmers, managers, and IT staff have replaced the human ‘‘computers’’ of days gone by. Speaking of old computers, I collect abacuses, and I know how to use them too — it keeps me in touch with the roots of computing. On my office wall is a very cool nineteenth-century Russian abacus. Benefits of a digital database The Information Age and the relational database brought several measurable benefits to organizations: ■ Increased data consistency and better enforcement of business rules ■ Improved sharing of data, especially across distances ■ Improved ability to search for and retrieve information ■ Improved generation of comprehensive reports ■ Improved ability to analyze data trends The general theme is that a computer database originally didn’t save time in the entry of data, but rather in the retrieval of data and in the quality of the data retrieved. However, with automated data collection in manufacturing, bar codes in retailing, databases sharing more data, and consumers placing their own orders on the Internet, the effort required to enter the data has also decreased. 44 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 45 Relational Database Design 3 The previous chapter’s sidebar titled ‘‘Planning Data Stores’’ discusses different types or styles of databases. This chapter presents the relational database design principles and pat- terns used to develop operational, or OLTP (online transaction processing), databases. Some of the relational principles and patterns may apply to other types of databases, but databases that are not used for first-generation data (such as most BI, reporting databases, data warehouses, or refer- ence data stores) do not necessarily benefit from normalization. In this chapter, when I use the term ‘‘database,’’ I’m referring exclusively to a relational, OLTP-style database. Tables, rows, columns A relational database collects related, or common, data in a single list. For example, all the product information may be listed in one table and all the customers in another table. A table appears similar to a spreadsheet and is constructed of columns and rows. The appeal (and the curse) of the spreadsheet is its informal development style, which makes it easy to modify and add to as the design matures. In fact, managers tend to store critical information in spreadsheets, and many databases started as informal spreadsheets. In both a spreadsheet and a database table, each row is an item in the list and each column is a specific piece of data concerning that item, so each cell should contain a single piece of data about a single item. Whereas a spreadsheet tends to be free-flowing and loose in its design, database tables should be very consistent in terms of the meaning of the data in a column. Because row and column consistency is so important to a database table, the design of the table is critical. Over the years, different development styles have referred to these concepts with various different terms, listed in Table 3-1. TABLE 3-1 Comparing Database Terms The List of Common A Piece of Information Development Style Items An Item in the List in the List Legacy software File Record Field Spreadsheet Spreadsheet/worksheet/ named range Row Column/cell Relational algebra/ logical design Entity, or relation Tuple (rhymes with couple) Attribute SQL DDL design Table Row Column Object-oriented design Class Object instance Property 45 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 46 Part I Laying the Foundation SQL Server developers generally refer to database elements as tables, rows, and columns when discussing the SQL DDL layer or physical schema, and sometimes use the terms entity, tuple, and attribute when discussing the logical design. The rest of this book uses the SQL terms (table, row, column), but this chapter is devoted to the theory behind the design, so I also use the relational algebra terms (entity, tuple, and attribute). Database design phases Traditionally, data modeling has been split into two phases, the logical design and the physical design; but Louis Davidson and I have been co-presenting at conferences on the topic of database design and I’ve become convinced that Louis is right when he defines three phases to database design. To avoid confusion with the traditional terms, I’m defining them as follows: ■ Conceptual Model: The first phase digests the organizational requirements and identifies the entities, their attributes, and their relationships. The conceptual diagram model is great for understanding, communicating, and verifying the organization’s requirements. The diagramming method should be easily understood by all the stakeholders — the subject-matter experts, the development team, and management. At this layer, the design is implementation independent: It could end up on Oracle, SQL Server, or even Access. Some designers refer to this as the ‘‘logical model.’’ ■ SQL DDL Layer: This phase concentrates on performance without losing the fidelity of the logical model as it applies the design to a specific version of a database engine — SQL Server 2008, for example, generating the DDL for the actual tables, keys, and attributes. Typically, the SQL DDL layer generalizes some entities, and replaces some natural keys with surrogate computer-generated keys. The SQL DDL layer might look very different than the conceptual model. ■ Physical Layer: The implementation phase considers how the data will be physically stored on the disk subsystems using indexes, partitioning, and materialized views. Changes made to this layer won’t affect how the data is accessed, only how it’s stored on the disk. The physical layer ranges from simple, for small databases (under 20Gb), to complex, with multiple filegroups, indexed views, and data routing partitions. This chapter focuses on designing the conceptual model, with a brief look at normalization followed by a repertoire of database patterns. Implementing a database without working through the SQL DLL Layer design phase is a certain path to a poorly performing database. I’ve seen far too many database purists who didn’t care to learn SQL Server implement conceptual designs only to blame SQL Server for the horrible performance. The SQL DLL Layer is covered in Chapter 20, ‘‘Creating the Physical Database Schema.’’ Tuning the physical layer is discussed in Chapters 64, ‘‘Indexing Strategies,’’ and 68, ‘‘Partitioning.’’ 46 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 47 Relational Database Design 3 Normalization In 1970, Dr. Edgar F. Codd published ‘‘A Relational Model of Data for Large Shared Data Bank’’ and became the father of relational database. During the 1970s Codd wrote a series of papers that defined the concept of database normalization. He wrote his famous ‘‘Codd’s 12 Rules’’ in 1985 to define what constitutes a relational database and to defend the relational database from software vendors who were falsely claiming to be relational. Since that time, others have amended and refined the concept of normalization. The primary purpose of normalization is to improve the data integrity of the database by reducing or eliminating modification anomalies that can occur when the same fact is stored in multiple locations within the database. Duplicate data raises all sorts of interesting problems for inserts, updates, and deletes. For example, if the product name is stored in the order detail table, and the product name is edited, should every order details row be updated? If so, is there a mechanism to ensure that the edit to the product name prop- agates down to every duplicate entry of the product name? If data is stored in multiple locations, is it safe to read just one of those locations without double-checking other locations? Normalization prevents these kinds of modification anomalies. Besides the primary goal of consistency and data integrity, there are several other very good reasons to normalize an OLTP relational database: ■ Performance: Duplicate data requires extra code to perform extra writes, maintain consis- tency, and manipulate data into a set when reading data. On my last large production contract (several terabytes, OLTP, 35K transactions per second), I tested a normalized version of the database vs. a denormalized version. The normalized version was 15% faster. I’ve found similar results in other databases over the years. Normalization also reduces locking contention and improves multiple-user concurrency ■ Development costs: While it may take longer to design a normalized database, it’s easier to work with a normalized database and it reduces development costs. ■ Usability: By placing columns in the correct table, it’s easier to understand the database and easier to write correct queries. ■ Extensibility: A non-normalized database is often more complex and therefore more difficult to modify. The three ‘‘Rules of One’’ Normalization is well defined as normalized forms — specific issues that address specific potential errors in the design (there’s a whole section on normal forms later in this chapter). But I don’t design a database with errors and then normalize the errors away; I follow normalization from the beginning to the conclusion of the design process. That’s why I prefer to think of normalization as positively stated principles. When I teach normalization I open with the three ‘‘Rules of One,’’ which summarize normalization from a positive point of view. One type of item is represented by one entity (table). The key to designing 47 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 48 Part I Laying the Foundation a schema that avoids update anomalies is to ensure that each single fact in real life is modeled by a single data point in the database. Three principles define a single data point: ■ One group of similar things is represented by one entity (table). ■ One thing is represented by one tuple (row). ■ One descriptive fact about the thing is represented by one attribute (column). Grok these three simple rules and you’ll be a long way toward designing a properly normalized database. Normalization As Story T he Time Traveler’s Wife , by Audrey Niffenegger, is one of my favorite books. Without giving away the plot or any spoilers, it’s an amazing sci-fi romance story. She moves through time conventionally, while he bounces uncontrollably through time and space. Even though the plot is more complex than the average novel, I love how Ms. Niffenegger weaves every detail together into an intricate flow. Every detail fits and builds the characters and the story. In some ways, a database is like a good story. The plot of the story is in the data model, and the data represents the characters and the details. Normalization is the grammar of the database. When two writers tell the same story, each crafts the story differently. There’s no single correct way to tell a story. Likewise, there may be multiple ways to model the database. There’s no single correct way to model a database — as long as the database contains all the information needed to extract the story and it follows the normalized grammar rules, the database will work. (Don’t take this to mean that any design might be a correct design. While there may be multiple correct designs, there are many more incorrect designs.) A corollary is that just as some books read better than others, so do some database schemas flow well, while other database designs are difficult to query. As with writing a novel, the foundation of data modeling is careful observation, a n understanding of reality, and clear thinking. Based on those insights, the data modeler constructs a logical system — a new virtual world — that models a slice of reality. Therefore, how the designer views reality and identifies entities and their interactions will influence the design of the virtual world. Like postmodernism, there’s no single perfect correct representation, only the viewpoint of the author/designer. Identifying entities The first step to designing a database conceptual diagram is to identify the entities (tables). Because any entity represents only one type of thing, it takes several entities together to represent an entire process or organization. Entities are usually discovered from several sources: ■ Examining existing documents (order forms, registration forms, patient files, reports) ■ Interviews with subject-matter experts ■ Diagramming the process flow At this early stage the goal is to simply collect a list of possible entities and their facts. Some of the entities will be obvious nouns, such as customers, products, flights, materials, and machines. 48 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 49 Relational Database Design 3 Other entities will be verbs: shipping, processing, assembling parts to build a product. Verbs may be entities, or they may indicate a relationship between two entities. The goal is to simply collect all the possible entities and their attributes. At this early stage, it’s also useful to document as many known relationships as possible, even if those relationships will be edited several times. Generalization Normalization has a reputation of creating databases that are complex and unwieldy. It’s true that some database schemas are far too complex, but I don’t believe normalization, by itself, is the root cause. I’ve found that the difference between elegant databases that are a joy to query and overly complex designs that make you want to polish your resume is the data modeler’s view of entities. When identifying entities, there’s a continuum, illustrated in Figure 3-1, ranging from a broad all-inclusive view to a very specific narrow definition of the entity. FIGURE 3-1 Entities can be identified along a continuum, from overly generalized with a single table, to overly specific with too many tables. Result: Overly Simple One Table Overly Complex Specific Tables • Data-driven design • Fewer tables • Easier to extend The overly simple view groups together entities that are in fact different types of things, e.g., storing machines, products, and processes in the single entity. This approach might risk data integrity for two reasons. First, it’s difficult to enforce referential integrity (foreign key constraints) because the primary key attempts to represent multiple types of items. Second, these designs tend to merge entities with different attributes, which means that many of the attributes (columns) won’t apply to various rows and will simply be left null. Many nullable columns means the data will probably be sparsely filled and inconsistent. At the other extreme, the overly specific view segments entities that could be represented by a single entity into multiple entities, e.g., splitting different types of subassemblies and finished products into multiple different entities. This type of design risks flexibility and usability: ■ The additional tables create additional work at every layer of the software. ■ Database relationships become more complex because what could have been a single rela- tionship is now multiple relationships. For example, instead of relating an assembly process between any part, the assembly relationship must now relate with multiple types of parts. 49 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 50 Part I Laying the Foundation ■ The database has now hard-coded the specific types of similar entities, making it very difficult to add another similar type of entity. Using the manufacturing example again, if there’s an entity for every type of subassembly, then adding another type of subassembly means changes at every level of the software. The sweet spot in the middle generalizes, or combines, similar entities into single entities. This approach creates a more flexible and elegant database design that is easier to query and extend: ■ Look for entities with similar attributes, or entities that share some attributes. ■ Look for types of entities that might have an additional similar entity added in the future. ■ Look for entities that might be summarized together in reports. When designing a generalized entity, two techniques are essential: ■ Use a lookup entity to organize the types of entities. For the manufacturing example, a subassemblytype attribute would serve the purpose of organizing the parts by subassembly type. Typically, this would be a foreign key to a subassemblytype entity. ■ Typically, the different entity types that could be generalized together do have some differences (which is why a purist view would want to segment them). Employing the supertype/subtype (discussed in the ‘‘Data Design Patterns’’ section) solves this dilemma perfectly. I’ve heard from some that generalization sounds like denormalization — it’s not. When generalizing, it’s critical that the entities comply with all the rules of normalization. Generalized databases tend to be data-driven, have fewer tables, and are easier to extend. I was once asked to optimize a database design that was modeled by a very specific-style data modeler. His design had 78 entities, mine had 18 and covered more features. For which would you rather write stored procedures? On the other hand, be careful to merge entities because they actually do share a root meaning in the data. Don’t merge unlike entities just to save programming. The result will be more complex programming. Best Practice G ranted, knowing when to generalize and when to segment can be an art form and requires a repertoire of database experience, but generalization is the buffer against database over-complexity; and consciously working at understanding generalization is the key to becoming an excellent data modeler. In my seminars I use an extreme example of specific vs. generalized design, asking groups of three to four attendees to model the database in two ways: first using an overly specific data modeling technique, and then modeling the database trying to hit the generalization sweet spot. Assume your team has been contracted to develop a database for a cruise ship’s activity director — think Julie McCoy, the cruise director on the Love Boat. 50 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/2009 12:07pm Page 51 Relational Database Design 3 The cruise offers a lot of activities: tango dance lessons, tweetups, theater, scuba lessons, hang-gliding, off-boat excursions, authentic Hawaiian luau, hula-dancing lessons, swimming lessons, Captain’s dinners, aerobics, and the ever-popular shark-feeding scuba trips. These various activities have differing require- ments, are offered multiple times throughout the cruise, and some are held at different locations. A pas- senger entity already exists; you’re expected to extend the database with new entities to handle activities but still use the existing passenger entity. In the seminars, the specialized designs often have an entity for every activity, every time an activity is offered, activities at different locations, and even activity requirements. I believe the maximum number of entities by a seminar group is 36. Admittedly, it’s an extreme example for illustration purposes, but I’ve seen database designs in production using this style. Each group’s generalized design tends to be similar to the one shown in Figure 3-2. A generalized activity entity stores all activities and descriptions of their requirements organized by activity type. The ActivityTime entity has one tuple (row) for every instance or offering of an activity, so if hula-dance lessons are offered three times, there will be three tuples in this entity. FIGURE 3-2 A generalized cruise activity design can easily accommodate new activities and locations. Generalized Design ActivityType Activity Time Activity Time Passenger Location SignUp Primary keys Perhaps the most important concept of an entity (table) is that it has a primary key — an attribute or set of attributes that can be used to uniquely identify the tuple (row). Every entity must have a primary key; without a primary key, it’s not a valid entity. By definition, a primary key must be unique and must have a value (not null). 51 www.getcoolebook.com . process between any part, the assembly relationship must now relate with multiple types of parts. 49 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/20 09 12:07pm Page 50 Part I Laying the Foundation ■. a specific version of a database engine — SQL Server 2008, for example, generating the DDL for the actual tables, keys, and attributes. Typically, the SQL DDL layer generalizes some entities,. with couple) Attribute SQL DDL design Table Row Column Object-oriented design Class Object instance Property 45 www.getcoolebook.com Nielsen c03.tex V4 - 07/21/20 09 12:07pm Page 46 Part I Laying the Foundation SQL

Ngày đăng: 04/07/2014, 09:20

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan