Tài liệu Windows Server 2008 Inside Out- P28 pptx

50 285 0
Tài liệu Windows Server 2008 Inside Out- P28 pptx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

A well-run and well-maintained network should have 99.99 percent availability. There should be less than 1 percent packet loss and packet turnaround of 80 milliseconds or less. To achieve this level of availability and performance the network must be moni- tored. Any time business systems extend to the Internet or to wide area networks (WANs), internal network monitoring must be supplemented with outside-in monitoring that checks the availability of the network and business systems. Resources, training, and documentation are essential to ensuring that you can manage and maintain mission-critical systems. Many organizations cripple the operations team by staffi ng minimally. Minimally manned teams will have marginal response times and nominal effectiveness. The organization must take the following steps:  Staff for success to be successful.  Conduct training before deploying new technologies.  Keep the training up-to-date with what’s deployed.  Document essential operations procedures. Every change to hardware, software, and the network must be planned and executed deliberately. To do this, you must have established change control procedures and well-documented execution plans. Change control procedures should be designed to ensure that everyone knows what changes have been made. Execution plans should be designed to ensure that everyone knows the exact steps that were or should be per- formed to make a change. Change logs are a key part of change control. Each piece of physical hardware deployed in the operational environment should have a change log. The change log should be stored in a text document or spreadsheet that is readily accessible to support personnel. The change log should show the following information:  Who changed the hardware  What change was made  When the change was made  Why the change was made SIDE OUT Use monitoring to ensure availability A well-run and well-maintained network should have 99.99 percent availability. There should be less than 1 percent packet loss and packet turnaround of 80 milliseconds or less. To achieve this level of availability and performance the network must be moni- tored. Any time business systems extend to the Internet or to wide area networks (WANs), internal network monitoring must be supplemented with outside-in monitoring that checks the availability of the network and business systems. Planning for Hardware Needs 1317 Chapter 38 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Establish and Follow Change Control Procedures Change control procedures must take into account the need for both planned changes and emergency changes. All team members involved in a planned change should meet regularly and follow a specifi c implementation schedule. No one should make changes that aren’t discussed with the entire implementation team. You should have well-defi ned backup and recovery plans. The backup plan should spe- cifi cally state the following information:  When full, incremental, differential, and log backups are used  How often and at what time backups are performed  Whether the backups must be conducted online or offl ine  The amount of data being backed up as well as how critical the data is  The tools used to perform the backups  The maximum time allowed for backup and restore  How backup media is labeled, recorded, and rotated Backups should be monitored daily to ensure that they are running correctly and that the media is good. Any problems with backups should be corrected immediately. Mul- tiple media sets should be used for backups, and these media sets should be rotated on a specifi c schedule. With a four-set rotation, there is one set for daily, weekly, monthly, and quarterly backups. By rotating one media set offsite, support staff can help ensure that the organization is protected in case of a disaster. The recovery plan should provide detailed step-by-step procedures for recovering the system under various conditions, such as procedures for recovering from hard disk drive failure or troubleshooting problems with connectivity to the back-end database. The recovery plan should also include system design and architecture documentation that details the confi guration of physical hardware, application logic components, and back-end data. Along with this information, support staff should provide a media set containing all software, drivers, and operating system fi les needed to recover the system. Note One thing administrators often forget about is spare parts. Spare parts for key compo- nents, such as processors, drives, and memory, should also be maintained as part of the recovery plan. Establish and Follow Change Control Procedures Change control procedures must take into account the need for both planned changes and emergency changes. All team members involved in a planned change should meet regularly and follow a specifi c implementation schedule. No one should make changes that aren’t discussed with the entire implementation team. Note One thing administrators often forget about is spare parts. Spare parts for key compo- nents, such as processors, drives, and memory, should also be maintained as part of the recovery plan. Chapter 38 1318 Chapter 38 Planning for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. You should practice restoring critical business systems using the recovery plan. Practice shouldn’t be conducted on the production servers. Instead, the team should practice on test equipment with a confi guration similar to the real production servers. Practicing once a quarter or semiannually is highly recommended. You should have well-defi ned problem escalation procedures that document how to handle problems and emergency changes that might be needed. Many organizations use a three-tiered help desk structure for handling problems:  Level 1 support staff forms the front line for handling basic problems. They typi- cally have hands-on access to the hardware, software, and network components they manage. Their main job is to clarify and prioritize a problem. If the problem has occurred before and there is a documented resolution procedure, they can resolve the problem without escalation. If the problem is new or not recognized, they must understand how, when, and to whom to escalate it.  Level 2 support staff includes more specialized personnel that can diagnose a particular type of problem and work with others to resolve a problem, such as system administrators and network engineers. They usually have remote access to the hardware, software, and network components they manage. This allows them to troubleshoot problems remotely and to send out technicians after they’ve pin- pointed the problem.  Level 3 support staff includes highly technical personnel who are subject matter experts, team leaders, or team supervisors. The level 3 team can include support personnel from vendors as well as representatives from the user community. Together, they form the emergency response or crisis resolution team that is responsible for resolving crisis situations and planning emergency changes. All crisis situations and emergencies should be responded to decisively and resolved methodically. A single person on the emergency response team should be responsible for coordinating all changes and executing the recovery plan. This same person should be responsible for writing an after-action report that details the emergency response and resolution process used. The after-action report should analyze how the emergency was resolved and what the root cause of the problem was. In addition, you should establish procedures for auditing system usage and detecting intrusion. In Windows Server 2008, auditing policies are used to track the successful or failed execution of the following activities:  Account logon events Tracks events related to user logon and logoff  Account management Tracks those tasks involved with handling user accounts, such as creating or deleting accounts and resetting passwords  Directory service access Tracks access to the Active Directory Domain Service (AD DS)  Object access Tracks system resource usage for fi les, directories, and objects  Policy change Tracks changes to user rights, auditing, and trust relationships Planning for Hardware Needs 1319 Chapter 38 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.  Privilege use Tracks the use of user rights and privileges  Process tracking Tracks system processes and resource usage  System events Tracks system startup, shutdown, restart, and actions that affect system security or the security log You should have an incident response plan that includes priority escalation of sus- pected intrusion to senior team members and provides step-by-step details on how to handle the intrusion. The incident response team should gather information from all network systems that might be affected. The information should include event logs, application logs, database logs, and any other pertinent fi les and data. The incident response team should take immediate action to lock out accounts, change passwords, and physically disconnect the system if necessary. All team members participating in the response should write a postmortem that details the following information:  What date and time they were notifi ed and what immediate actions they took  Who they notifi ed and what the response was from the notifi ed individual  What their assessment of the issue is and the actions necessary to resolve and prevent similar incidents The team leader should write an executive summary of the incident and forward this to senior management. The following checklist summarizes the recommendations for operational support of high-availability systems:  Monitor hardware, software, and network components 24/7.  Ensure that monitoring doesn’t interfere with normal systems operations.  Gather only the data required for meaningful analysis.  Establish procedures that let personnel know what to look for in the data.  Use outside-in monitoring any time systems are externally accessible.  Provide adequate resources, training, and documentation.  Establish change control procedures that include change logs.  Establish execution plans that detail the change implementation.  Create a solid backup plan that includes onsite and offsite tape rotation.  Monitor backups and test backup media.  Create a recovery plan for all critical systems.  Test the recovery plan on a routine basis.  Document how to handle problems and make emergency changes. Chapter 38 1320 Chapter 38 Planning for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.  Use a three-tier support structure to coordinate problem escalation.  Form an emergency response or crisis resolution team.  Write after-action reports that detail the process used.  Establish procedures for auditing system usage and detecting intrusion.  Create an intrusion response plan with priority escalation.  Take immediate action to handle suspected or actual intrusion.  Write postmortem reports detailing team reactions to the intrusion. Planning for Deploying Highly Available Servers You should always create a plan before deploying a business system. The plan should show everything that must be done before the system is transitioned into the produc- tion environment. After a system is in the production environment, the system is deemed operational and should be handled as outlined in “Planning for Day-to-Day Operations” on page 1316. The deployment plan should include the following items:  Checklists  Contact lists  Test plans  Deployment schedules Checklists are a key part of the deployment plan. The purpose of a checklist is to ensure that the entire deployment team understands the steps they need to perform. Checklists should list the tasks that must be performed and designate individuals to handle the tasks during each phase of the deployment—from planning to testing to installation. Prior to executing a checklist, the deployment team should meet to ensure that all items are covered and that the necessary interactions among team members are clearly understood. After deployment, the preliminary checklists should become a part of the system documentation and new checklists should be created any time the system is updated. The deployment plan should include a contact list. The contact list should provide the name, role, telephone number, and e-mail address of all team members, vendors, and solution provider representatives. Alternative numbers for cell phones and pagers should be provided as well. The deployment plan should include a test plan. An ideal test plan has several phases. In Phase I, the deployment team builds the business system and support structures in a test lab. Building the system means accomplishing the following tasks:  Creating a test network on which to run the system Planning for Hardware Needs 1321 Chapter 38 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.  Putting together the hardware and storage components  Installing the operating system and application software  Adjusting basic system settings to suit the test environment  Confi guring clustering or network load balancing as appropriate The deployment team can conduct any necessary testing and troubleshooting in the isolated lab environment. The entire system should undergo burn-in testing to guard against faulty components. If a component is fl awed, it usually fails in the fi rst few days of operation. Testing doesn’t stop with burn-in. Web and application servers should be stress tested. Database servers should be load tested. The results of the stress and load tests should be analyzed to ensure that the system meets the performance requirements and expectations of the customer. Adjustments to the confi guration should be made to improve performance and optimize for the expected load. In Phase II, the deployment team tests the business system and support equipment in the deployment location. They conduct similar tests as before but in the real-world environment. Again, the results of these tests should be analyzed to ensure that the sys- tem meets the performance requirements and expectations of the customer. Afterward, adjustments should be made to improve performance and optimize as necessary. The team can then deploy the business system. After deployment, the team should perform limited, nonintrusive testing to ensure that the system is operating normally. After Phase III testing is completed, the team can use the operational plans for monitoring and maintenance. The following checklist summarizes the recommendations for predeployment planning of mission-critical systems:  Create a plan that covers the entire testing to operations cycle.  Use checklists to ensure that the deployment team understands the procedures.  Provide a contact list for the team, vendors, and solution providers.  Conduct burn-in testing in the lab.  Conduct stress and load testing in the lab.  Use the test data to optimize and adjust the confi guration.  Provide follow-on testing in the deployment location.  Follow a specifi c deployment schedule.  Use operational plans once fi nal tests are completed. Chapter 38 1322 Chapter 38 Planning for High Availability Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. C lustering technologies allow servers to be connected into multiple-server units called server clusters. Each computer connected in a server cluster is referred to as a node. Nodes work together, acting as a single unit, to provide high availability for busi- ness applications and other critical resources, such as Microsoft Internet Information Services (IIS), Microsoft SQL Server, or Microsoft Exchange Server. Clustering allows administrators to manage the cluster nodes as a single system rather than as individual systems. Clustering allows users to access cluster resources as a single system as well. In most cases, the user doesn’t even know the resources are clustered. The main cluster technologies that Windows Server 2008 supports are:  Failover clustering Failover clustering provides improved availability for appli- cations and services that require high availability, scalability, and reliability. By using server clustering, organizations can make applications and data available on multiple servers linked together in a cluster confi guration. The clustered serv- ers (called nodes) are connected by physical cables and by software. If one of the nodes fails, another node begins to provide service. This process, known as failover, ensures that users experience a minimum of disruptions in service. Back- end applications and services, such as those provided by database servers, are ideal candidates for failover clustering.  Network Load Balancing Network Load Balancing (NLB) provides failover sup- port for Internet Protocol (IP)–based applications and services that require high scalability and availability. By using Network Load Balancing, organizations can build groups of clustered computers to support load balancing of Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Generic Routing Encapsulation (GRE) traffi c requests. Front-end Web servers are ideal candidates for Network Load Balancing. These cluster technologies are discussed in this chapter so that you can plan for and implement your organization’s high-availability needs. Introducing Server Clustering . . . . . . . . . . . . . . . . . . . . 1324 Using Network Load Balancing . . . . . . . . . . . . . . . . . . 1331 Managing Network Load Balancing Clusters . . . . . . . 1337 Using Failover Clustering . . . . . . . . . . . . . . . . . . . . . . . . 1345 Running Failover Clusters . . . . . . . . . . . . . . . . . . . . . . . 1352 Creating Failover Clusters . . . . . . . . . . . . . . . . . . . . . . . 1356 Managing Failover Clusters and Their Resources . . . . 1361 CHAPTER 39 Preparing and Deploying Server Clusters 1323 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Introducing Server Clustering A server cluster is a group of two or more servers functioning together to provide essen- tial applications or services seamlessly to enterprise clients. The servers are physically connected together by a network and might share storage devices. Server clusters are designed to protect against application and service failure, which could be caused by application software or essential services becoming unavailable; system and hardware failure, which could be caused by problems with hardware components such as central processing units (CPUs), drives, memory, network adapters, and power supplies; and site failure, which could be caused by natural disaster, power outages, or connectivity outages. You can use cluster technologies to increase overall availability while minimizing single points of failure and reducing costs by using industry-standard hardware and software. Each cluster technology has a specifi c purpose and is designed to meet differ- ent requirements. Network Load Balancing is designed to address bottlenecks caused by Web services. Failover clustering is designed to maintain data integrity and allow a node to provide service if another node fails. The clustering technologies can be and often are combined to architect a comprehen- sive service offering. The most common scenario in which both solutions are combined is a commercial Web site where the site’s Web servers use Network Load Balancing and back-end database servers use failover clustering. Benefi ts and Limitations of Clustering A server cluster provides high availability by making application software and data available on several servers linked together in a cluster confi guration. If a server stops functioning, a failover process can automatically shift the workload of the failed server to another server in the cluster. The failover process is designed to ensure continuous availability for critical applications and data. Although clusters can be designed to handle failure, they are not fault tolerant with regard to user data. The cluster by itself doesn’t guard against loss of a user’s work. Typically, the recovery of lost work is handled by the application software, meaning the application software must be designed to recover the user’s work or it must be designed in such a way that the user session state can be maintained in the event of failure. Clusters help to resolve the need for high availability, high reliability, and high scal- ability. High availability refers to the ability to provide user access to an application or a service a high percentage of scheduled times while attempting to reduce unscheduled outages. A cluster implementation is highly available if it meets the organization’s scheduled uptime goals. Availability goals are achieved by reducing unplanned down- time and then working to improve total hours of operation for the related applications and services. High reliability refers to the ability to reduce the frequency of system failure while attempting to provide fault tolerance in case of failure. A cluster implementation is highly reliable if it minimizes the number of single points of failure and reduces the Chapter 39 1324 Chapter 39 Preparing and Deploying Server Clusters Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. risk that failure of a single component or system will result in the outage of all appli- cations and services offered. Reliability goals are achieved by using redundant, fault- tolerant hardware components, application software, and systems. High scalability refers to the ability to add resources and computers while attempting to improve performance. A cluster implementation is highly scalable if it can be scaled up and out. Individual systems can be scaled up by adding more resources such as CPUs, memory, and disks. The cluster implementation can be scaled out by adding more computers. Design for Availability A well-designed cluster implementation uses redundant systems and components so that the failure of an individual server doesn’t affect the availability of the related applications and services. Although a well-designed solution can guard against application failure, system failure, and site failure, cluster technologies do have limitations. Cluster technologies depend on compatible applications and services to operate prop- erly. The software must respond appropriately when failure occurs. Cluster technology cannot protect against failures caused by viruses, software corruption, or human error. To protect against these types of problems, organizations need solid data protection and recovery plans. Cluster Organization Clusters are organized in loosely coupled groups often referred to as farms or packs. A farm is a group of servers that run similar services but don’t typically share data. They are called a farm because they handle whatever requests are passed out to them using identical copies of data that are stored locally. Because they use identical copies of data rather than sharing data, members of a farm operate autonomously and are also referred to as clones. A pack is a group of servers that operate together and share partitioned data. They are called a pack because they work together to manage and maintain services. Because members of a pack share access to partitioned data, they have unique operations modes and usually access the shared data on disk drives to which all members of the pack are connected. In most cases, Web and application services are organized as farms, while back-end databases and critical support services are organized as packs. Web servers running IIS and using Network Load Balancing are an example of a farm. In a Web farm, identical data is replicated to all servers in the farm and each server can handle any request that comes to it by using local copies of data. For example, you might have a group of fi ve Web servers using Network Load Balancing, each with its own local copy of the Web site data. Design for Availability A well-designed cluster implementation uses redundant systems and components so that the failure of an individual server doesn’t affect the availability of the related applications and services. Although a well-designed solution can guard against application failure, system failure, and site failure, cluster technologies do have limitations. Introducing Server Clustering 1325 Chapter 39 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. Database servers running SQL Server and failover clustering with partitioned database views are an example of a pack. Here, members of the pack share access to the data and have a unique portion of data or logic that they handle rather than handling all data requests. For example, in a two-node SQL Server cluster, one database server might handle accounts that begin with the letters A through M and another database server might handle accounts that begin with the letters N through Z. Servers that use clustering technologies are often organized using a three-tier structure. The tiers in the architecture are composed as follows:  Tier 1 includes the Web servers, which are also called front-end Web servers. Front-end Web servers typically use Network Load Balancing.  Tier 2 includes the application servers, which are often referred to as the middle- tier servers. Middle-tier servers typically use the Windows Communications Foundation (WCF) or other Web Services technologies to implement load bal- ancing for application components that use COM+. Using a WCF-based load bal- ancer, COM+ components can be load balanced over multiple nodes to enhance the availability and scalability of software applications.  Tier 3 includes the database servers, fi le servers, and other critical support serv- ers, which are often called back-end servers. Back-end servers typically use failover clustering. As you set out to architect your cluster solution, you should try to organize servers according to the way they will be used and the applications they will be running. In most cases, Web servers, application servers, and database servers are all organized in different ways. By using proper architecture, the servers in a particular tier can be scaled out or up as necessary to meet growing performance and throughput needs. When you are looking to scale out by adding servers to the cluster, the clustering technology and the server operating system used are both important:  All editions of Windows Server 2008 support up to 32-node Network Load Bal- ancing clusters.  Windows Server Enterprise and Windows Server Datacenter support failover clustering, allowing up to 8-node clusters. When looking to scale up by adding CPUs and random access memory (RAM), the edition of the server operating system used is extremely important. In terms of both processor and memory capacity, Windows Server Datacenter is much more expandable than either Windows Server Standard or Windows Server Enterprise. As you look at scalability requirements, keep in mind the real business needs of the organization. The goal should be to select the right edition of the Windows operating system to meet current and future needs. The number of servers needed depends on the anticipated server load as well as the size and types of requests the servers will handle. Processors and memory should be sized appropriately for the applications and services the servers will be running as well as the number of simultaneous user connections. Chapter 39 1326 Chapter 39 Preparing and Deploying Server Clusters Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark. [...]... clusters have many specific hardware requirements as well In order to implement failover clustering with Windows Server 2008, all hardware components must be compatible with Windows Server 2008 Hardware components certified as compatible with Windows Server 2008 have the Designed For Windows Server 2008 logo To help validate a configuration prior to deploying failover clustering, the cluster administration... cluster-aware include the following: Distributed File System (DFS) Namespace Server DHCP Server Chapter 39 Exchange Server File Server Internet Storage Name Service (iSNS) Server Microsoft Distributed Transaction Coordinator (MS DTC) Microsoft Message Queuing (MSMQ) Print Server SQL Server Windows Internet Naming Service (WINS) Server Generic applications and services can also be cluster-aware Check with... containing Server A, Server B, and Server C The fi rst incoming request is handled by Server A, the second by Server B, the third by Server C, and then the cycle is repeated in that order (A, B, C, A, B, C, ) Unfortunately, if one of the servers fails, there is no way to notify the group of the failure As a result, the round robin strategy continues to send requests to the failed server Windows Network... to the IP stack For Windows Server 2008, IP address handling has been extended to accommodate IP version 6 (IPv6) as well as multiple dedicated IP addresses for IPv4 and IPv6 This allows you to configure NLB clusters with one or more dedicated IPv4 and IPv6 addresses Network Load Balancing can be used with Microsoft Internet Security and Acceleration (ISA) Server For Windows Server 2008, both NLB and... with one set of cluster servers must be isolated from all other servers using LUN masking or zoning When planning failover clusters, you should keep in mind these additional requirements: Servers in the same cluster must all run the same hardware architecture version of the Windows Server 2008 operating system For example, they should all use either the x64 or the Itanium version Servers in the same cluster... Cluster service uses the concept of virtual servers to specify groups of resources that fail over together Thus, when a server fails, the group of resources configured on that server for clustering fail over to another server The server that handles the failover should be configured for the extra capacity needed to handle the additional workload When the failed server comes back online, the Cluster service... networks (SANs) or using direct-attached storage (DAS), allow different servers to share the same data and thus, by reading this data, provide failover for resources 1346 Chapter 39 Preparing and Deploying Server Clusters Accounting file and print virtual server Engineering file and print virtual server Web server file and print virtual server Node Chapter 39 Storage device(s) (not clustered) Figure 39-11... failed servers, Network Load Balancing sends heartbeats to participating servers These heartbeats are similar to those used by the Cluster service The purpose of the heartbeat is to track the condition of each participant in the group If a server in the group fails to send heartbeat messages to other servers in the group for a specified interval, the server is assumed to have failed The remaining servers... transparent failover and failback When a loadbalanced resource fails on one server, the remaining servers in the group take over the workload of the failed server When the failed server comes back online, the server can automatically rejoin the cluster group, and Network Load Balancing starts to distribute the load to the server automatically Failover takes less than 10 seconds in most cases Network... additional servers are needed The servers should be able to meet demands of the stress testing with 70 percent or less server load with all servers running During failure testing, the peak load shouldn’t rise above 80 percent If either of these thresholds is reached, the cluster size might need to be increased Servers that use Network Load Balancing can benefit from optimization as well Servers should . adding servers to the cluster, the clustering technology and the server operating system used are both important:  All editions of Windows Server 2008 support. fails on one server, the remaining servers in the group take over the workload of the failed server. When the failed server comes back online, the server can

Ngày đăng: 24/12/2013, 03:16

Tài liệu cùng người dùng

Tài liệu liên quan