DISTRIBUTED SYSTEMS principles and paradigms Second Edition phần 5 pot

71 1.1K 4
DISTRIBUTED SYSTEMS principles and paradigms Second Edition phần 5 pot

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

266 SYNCHRONIZA nON CHAP. 6 A Ring Algorithm Another election algorithm is based on the use of a ring. Unlike some ring al- gorithms, this one does not use a token. We assume that the processes are physi- cally or logically ordered, so that each process knows who its successor is. When any process notices that the coordinator is not functioning, it builds an ELEC- TION message containing its own process number and sends the message to' its successor. If the successor is down, the sender skips over the successor and goes to the next member along the ring. or the one after that, until a running process is located. At each step along the way, the sender adds its own process number to the list in the message effectively making itself a candidate to be elected as coor- dinator. Eventually, the message gets back to the process that started it all. That proc- ess recognizes this event when it receives an incoming message containing its own process number. At that point, the message type is changed to COORDINA- TOR and circulated once again, this time to inform everyone else who the coordi- nator is (the list member with the highest number) and who the members of the new ring are. When this message has circulated once, it is removed and everyone goes back to work. Figure 6-21. Election algorithm using a ring. In Fig. 6-21 we see what happens if two processes, 2 and 5, discover simul- taneously that the previous coordinator, process 7, has crashed. Each of these builds an ELECTION message and and each of them starts circulating its mes- sage, independent of the other one. Eventually, both messages will go all the way around, and both 2 and 5 will convert them into COORDINATOR messages, with exactly the same members and in the same order. When both have gone around again, both will be removed. It does no harm to have extra messages circulating; at worst it consumes a little bandwidth, but this not considered wasteful. SEC. 6.5 ELECTION ALGORITHMS 267 6.5.2 Elections in Wireless Environments Traditional election algorithms are generally based on assumptions that are not realistic in wireless environments. For example, they assume that message passing is reliable and that the topology of the network does not change. These assumptions are false in most wireless environments, especially those for mobile ad hoc networks. Only few protocols for elections have been developed that work in ad hoc net- works. Vasudevan et al. (2004) propose a solution that can handle failing nodes and partitioning networks. An important property of their solution is that the best leader can be elected rather than just a random as was more or less the case in the previously discussed solutions. Their protocol works as follows. To· simplify our discussion, we concentrate only on ad hoc networks and ignore that nodes can move. Consider a wireless ad hoc network. To elect a leader, any node in the net- work, called the source, can initiate an election by sending an ELECTION mes- sage to its immediate neighbors (i.e., the nodes in its range). When a node receives an ELECTION for the first time, it designates the sender as its parent, and subsequently sends out an ELECTION message to all its immediate neigh- bors, except for the parent. When a node receives an ELECTION message from a node other than its parent, it merely acknowledges the receipt. When node R has designated node Q as its parent, it forwards the ELECTION message to its immediate neighbors (excluding Q) and waits for acknowledgments to come in before acknowledging the ELECTION message from Q. This waiting has an important consequence. First, note that neighbors that have already selected a parent will immediately respond to R. More specifically, if all neigh- bors already have a parent, R is a leaf node and will be able to report back to Q quickly. In doing so, it will also report information such as its battery lifetime and other resource capacities. This information will later allow Q to compare R's capacities to that of other downstream nodes, and select the best eligible node for leadership. Of course, Q had sent an ELECTION message only because its own parent P had done so as well. In tum, when Q eventually acknowledges the ELECTION message previ- ously sent by P, it will pass the most eligible node to P as well. In this way, the source will eventually get to know which node is best to be selected as leader, after which it will broadcast this information to all other nodes. This process is illustrated in Fig. 6-22. Nodes have been labeled a to j, along with their capacity. Node a initiates an election by broadcasting an ELECTION message to nodes band j, as shown in Fig. 6-22(b). After that step, ELECTION messages are propagated to all nodes, ending with the situation shown in Fig. 6- 22(e), where we have omitted the last broadcast by nodes f and i: From there on, each node reports to its parent the node with the best capacity, as shown in Fig.6-22(f). For example, when node g receives the acknowledgments from its 268 SYNCHRONIZATION CHAP. 6 Figure 6-22. Election algorithm in a wireless network, with node a as the source. (a) Initial network. (b)-(e) The build-tree phase (last broadcast step by nodes f and i not shown). (f) Reporting of best node to source. children e and h, it will notice that h is the best node, propagating [h, 8] to its own parent, node b. In the end, the source will note that h is the best leader and will broadcast this information to all other nodes. SEC. 6.5 ELECTION ALGORITHMS 269 When multiple elections are initiated, each node will decide to join only one election. To this end, each source tags its ELECTION message with a unique i- dentifier. Nodes will participate only in the election with the highest identifier, stopping any running participation in other elections. With some minor adjustments, the protocol can be shown to operate also when the network partitions, and when nodes join and leave. The details can be found in Vasudevan et al. (2004). 6.5.3 Elections in Large-Scale Systems The algorithms we have been discussing so far generally apply· to relatively small distributed systems. Moreover, the algorithms concentrate on the selection of only a single node. There are situations when several nodes should actually be selected, such as in the case of superpeers in peer-to-peer networks, which we discussed in Chap. 2. In this section, we concentrate specifically on the problem of selecting superpeers. Lo et al. (2005) identified the following requirements that need to be met for superpeer selection: 1. Normal nodes should have low-latency access to superpeers. 2. Superpeers should be evenly distributed across the overlay network. 3. There should be a predefined portion of superpeers relative to the total number of nodes in the overlay network. 4. Each superpeer should not need to serve more than a fixed number of normal nodes. Fortunately, these requirements are relatively easy to meet in most peer-to-peer systems, given the fact that the overlay network is either structured (as in DHT- based systems), or randomly unstructured (as, for example, can be realized with gossip-based solutions). Let us take a look at solutions proposed by Lo et al. (2005). In the case of DHT-based systems, the basic idea is to reserve a fraction of the identifier space for superpeers. Recall that in DHT-based systems each node receives a random and uniformly assigned m-bit identifier. Now suppose we reserve the first (i.e., leftmost) k bits to identify superpeers. For example, if we need N superpeers, then the first rlog 2 (N)l bits of any key can be used to identify these nodes. To explain, assume we have a (small) Chord system with m = 8 and k = 3. When looking up the node responsible for a specific key p, we can first decide to route the lookup request to the node responsible for the pattern p AND 11100000 270 SYNCHRONIZATION CHAP. 6 to see if this request is routed to itself. Provided node identifiers are uniformly assigned to nodes. it can be seen that with a total of N nodes the number of superpeers is, on average. equal 2 k - m N. . A completely different approach is based on positioning nodes in an m- dimensional geometric space as we discussed above. In this case, assume we need to place N superpeers evenly throughout the overlay. The basic idea is simple: a total of N tokens are spread across N randomly-chosen nodes. No node can hold more than one token. Each token represents a repelling force by which another token is inclined to move away. The net effect is that if all tokens exert the same repulsion force, they will move away from each other and spread themselves evenly in the geometric space. This approach requires that nodes holding a token learn about other tokens. To this end, La et al. propose to use a gossiping protocol by which a token's force is disseminated throughout the network. If a node discovers that the total forces that are acting on it exceed a threshold, it will move the token in the direction of the combined forces, as shown in Fig. 6-23. Figure 6-23. Moving tokens in a two-dimensional space using repulsion forces. When a token is held by a node for a given amount of time, that node will pro- mote itself to superpeer. 6.6 SUMMARY Strongly related to communication between processes is the issue of how processes in distributed systems synchronize. Synchronization is all about doing the right thing at the right time. A problem in distributed systems, and computer networks in general, is that there is no notion of a globally shared clock. In other words, processes on different machines have their own idea of what time it is. which is then treated as the superpeer. Note that each node id can check whether it is a suoemeer bv looking up SEC. 6.6 SUMMARY 271 There are various way to synchronize clocks in a distributed system, but all methods are essentially based on exchanging clock values, while taking into account the time it takes to send and receive messages. Variations in communica- tion delays and the way those variations are dealt with, largely determine the accuracy of clock synchronization algorithms. Related to these synchronization problems is positioning nodes in a geometric overlay. The basic idea is to assign each node coordinates from an rn-dimensional space such that the geometric distance can be used as an accurate measure for the latency between two nodes. The method of assigning coordinates strongly resem- bles the one applied in determining the location and time in GPS. In many cases, knowing the absolute time is not necessary. What counts is that related events at different processes happen in the correct order. Lamport showed that by introducing a notion of logical clocks, it is possible for a collec- tion of processes to reach global agreement on the correct ordering of events. In essence, each event e, such as sending or receiving a message, is assigned a glo- bally unique logical timestamp C (e) such that when event a happened before b, C (a) < C (b). Lamport timestamps can be extended to vector timestamps: if C (a) < C (b), we even know that event a causally preceded b. An important class of synchronization algorithms is that of distributed mutual exclusion. These algorithms ensure that in a distributed collection of processes, at most one process at a time has access to a shared resource. Distributed mutual exclusion can easily be achieved if we make use of a coordinator that keeps track of whose turn it is. Fully distributed algorithms also exist, but have the drawback that they are generally more susceptible to communication and process failures. Synchronization between processes often requires that one process acts as a coordinator. In those cases where the coordinator is not fixed, it is necessary that processes in a distributed computation decide on who is going to be that coordina- tor. Such a decision is taken by means of election algorithms. Election algorithms are primarily used in cases where the coordinator can crash. However, they can also be applied for the selection of superpeers in peer-to-peer systems. PROBLEMS 1. Name at least three sources of delay that can be introduced between WWV broadcast- ing the time and the processors in a distributed system setting their internal clocks. 2. Consider the behavior of two machines in a distributed system. Both have clocks that are supposed to tick 1000 times per millisecond. One of them actually does, but the other ticks only 990 times per millisecond. If UTC updates come in once a minute, what is the maximum clock skew that will occur? 3. One of the modem devices that have (silently) crept into distributed systems are GPS receivers. Give examples of distributed applications that can use GPS information. 272 SYNCHRONIZATION CHAP. 6 4. When a node synchronizes its clock to that of another node, it is generally a good idea to take previous measurements into account as well. Why? Also, give an example of how such past readings could be taken into account. 5. Add a new message to Fig. 6-9 that is concurrent with message A, that is, it neither happens before A nor happens after A. 6. To achieve totally-ordered multicasting with Lamport timestamps, is it strictly neces- sary that each message is acknowledged? . 7. Consider a communication layer in which messages are delivered only in the order that they were sent. Give an example in which even this ordering is unnecessarily re- strictive. 8. Many distributed algorithms require the use of a coordinating process. To what extent can such algorithms actually be considered distributed? Discuss. 9. In the centralized approach to mutual exclusion (Fig. 6-14), upon receiving a message from a process releasing its exclusive access to the resources it was using, the coordi- nator normally grants permission to the first process on the queue. Give another pos- sible algorithm for the coordinator. 10. Consider Fig. 6-14 again. Suppose that the coordinator crashes. Does this always bring the system down? If not, under what circumstances does this happen? Is there any way to avoid the problem and make the system able to tolerate coordinator crashes? 11. Ricart and Agrawala's algorithm has the problem that if a process has crashed and does not reply to a request from another process to access a resources, the lack of response will be interpreted as denial of permission. We suggested that all requests be answered immediately to make it easy to detect crashed processes. Are there any cir- cumstances where even this method is insufficient? Discuss. 12. How do the entries in Fig. 6-17 change if we assume that the algorithms can be imple- mented on a LAN that supports hardware broadcasts? 13. A distributed system may have multiple, independent resources. Imagine that process o wants to access resource A and process 1 wants to access resource B. Can Ricart and Agrawala's algorithm lead to deadlocks? Explain your answer. 14. Suppose that two processes detect the demise of the coordinator simultaneously and both decide to hold an election using the bully algorithm. What happens? 15. In Fig. 6-21 we have two ELECTION messages circulating simultaneously. While it does no harm to have two of them, it would be more elezant if one could be killed off. "- Devise an algorithm for doing this without affecting the operation of the basic election algorithm. 16. (Lab assignment) UNIX systems provide many facilities to keep computers in synch, notably the combination of the crontab tool (which allows to automatically schedule operations) and various synchronization commands are powerful. Configure a UNIX system that keeps the local time accurate with in the range of a single second. Like- wise, configure an automatic backup facility by which a number of crucial files are . automatically transferred to a remote machine once every 5 minutes. Your solution should be efficient when it comes to bandwidth usage. 7 CONSISTENCY AND REPLICATION :i.:Animportant issue in distributed systems is the replication of data. Data are generally replicated to enhance reliability or improve performance. One of the major problems is keeping replicas consistent. Informally, this means that when one copy is updated we need to ensure that the other copies are updated as well; otherwise the replicas will no longer be the same. In this chapter, we take a de- tailed look at what consistency of replicated data .actually means and the various ways that consistency can be achieved. We start with a general introduction discussing why replication is useful and how it relates to scalability. We then continue by focusing on what consistency actually means. An important class of what are known as consistency models as- sumes that multiple processes simultaneously access shared data. Consistency for these situations can be formulated with respect to what processes can expect when reading and updating the shared data, knowing that others are accessing that data as well. Consistency models for shared data are often hard to implement efficiently in large-scale distributed systems. Moreover, in many cases simpler models can be used, which are also often easier to implement. One specific class is formed by client-centric consistency models, which concentrate, on consistency from the per- spective of a single (possibly mobile) client. Client-centric consistency models are discussed in a separate section. Consistency is only half of the story. We also need to consider how consisten- cy is actually implemented. There are essentially two, more or less independent, 273 274 CONSISTENCY AND REPLICATION CHAP. 7 issues we need to consider. First of all, we start with concentrating on managing replicas, which takes into account not only the placement of replica servers, but also how content is distributed to these servers. The second issue is how replicas are kept consistent. In most cases, applica- tions require a strong form of consistency. Informally, this means that updates are to be propagated more or less immediately between replicas. There are various al- ter/natives for implementing strong consistency, which are discussed in a separate section. Also, attention is paid to caching protocols, which form a special case of consistency protocols. 7.1 INTRODUCTION In this section, we start with discussing the important reasons for wanting to replicate data in the first place. We concentrate on replication as a technique for achieving scalability, and motivate why reasoning about consistency is so impor- tant. 7.1.1 Reasons for Replication There are two primary reasons for replicating data: reliability and perfor- mance. First, data are replicated to increase the reliability of a system. If a file system has been replicated it may be possible to continue working after one rep- lica crashes by simply switching to one of the other replicas. Also, by maintaining multiple copies, it becomes possible to provide better protection against corrupted data. For example, imagine there are three copies of a file and every read and write operation is performed on each copy. We can safeguard ourselves against a single, failing write operation, by considering the value that is returned by at least two copies as being the correct one. The other reason for replicating data is performance. Replication for perfor- mance is important when the distributed system needs to scale in numbers and geographical area. Scaling in numbers. occurs, for example, when an increasing number of processes needs to access data that are managed by a single server. In that case, performance can be improved by replicating the server and subse- quently dividing the work. Scaling with respect to the size of a geographical area may also require repli- cation. The basic idea is that by placing a copy of data in the proximity of the process using them, the time to access the data decreases. As a consequence, the performance as perceived by that process increases. This example also illustrates that the benefits of replication for performance may be hard to evaluate. Although a client process may perceive better performance, it may also be the case that more network bandwidth is now consumed keeping all replicas up to date. SEC. 7.1 INTRODUCTION 275 If replication helps to improve reliability and performance, who could be against it? Unfortunately, there is a price to be paid when data are replicated. The • problem with replication is that having multiple copies may lead to consistency problems. Whenever a copy is modified, that copy becomes different from the rest. Consequently, modifications have to be carried out on all copies to ensure consistency. Exactly when and how those modifications need to be carried out determines the price of replication. To understand the problem, consider improving access times to Web pages. If no special measures are taken, fetching a page from a remote Web server may sometimes even take seconds to complete. To improve performance, Web brow- sers often locally store a copy of a previously fetched Web page (i.e., they cache a Web page). If a user requires that page again, the browser automatically returns the local copy. The access time as perceived by the user is excellent. However, if the user always wants to have the latest version of a page, he may be in for bad luck. The problem is that if the page has been modified in the meantime, modifi- cations will not have been propagated to cached copies, making those copies out- of-date. One solution to the problem of returning a stale copy to the user is to forbid the browser to keep local copies in the first place, effectively letting the server be fully in charge of replication. However, this solution may still lead to poor access times if no replica is placed near the user. Another- solution is to let the Web server invalidate or update each cached copy, but this requires that the server keeps track of all caches and sending them messages. This, in turn, may degrade the overall performance of the server. We return to performance versus scalability issues below. 7.1.2 Replication as Scaling Technique Replication and caching for performance are widely applied as scaling tech- niques. Scalability issues generally appear in the form of performance problems. Placing copies of data close to the processes using them can improve performance through reduction of access time and thus solve scalability problems. A possible trade-off that needs to be made is that keeping copies up to date may require more network bandwidth. Consider a process P that accesses a local replica N times per second, whereas the replica itself is updated M times per sec- ond. Assume that an update completely refreshes the previous version of the local replica. If N «M, that is, the access-to-update ratio is very low, we have the situation where many updated versions of the local replica will never be accessed by P, rendering the network communication for those versions useless. In this case, it may have been better not to install a local replica close to P, or to apply a different strategy for updating the replica. We return to these issues below. A more serious problem, however, is that keeping multiple copies consistent may itself be subject to serious scalability problems. Intuitively, a collection of [...]... distribution and consistency protocols Different approaches to classifying consistency and replication can be found in Gray et a1 (1996) and Wiesmann et a1 (2000) 7.2 DATA-CENTRIC CONSISTENCY MODELS Traditionally, consistency has been discussed in the context of read and write operations on shared data, available by means of (distributed) shared memory a (distributed) shared database, or a (distributed) ... means of a distributed shared database Suppose that process P, writes a data item x Then P2 reads x and writes y Here the reading of x and the writing of y are potentially causally related because the SEC 7.2 2 85 DATA-CENTRIC CONSISTENCY MODELS computation of y may have depended on the value of x as read by Pz (i.e., the value written by PI)' On the other hand, if two processes spontaneously and simultaneously... processes and the distributed shared data store is that the processes must accept all of these as valid results In other words, the processes must accept the four results shown in Fig 7-7 and all the other valid results as proper answers, and must work correctly if any of them occurs A program that works for some of these results and not for others violates the contract with the data store and is incorrect... A second issue is that program developers must specify the consistency requirements for their applications Practice indicates that obtaining such require-ments may be extremely difficult Programmers are generally not used to handling replication, let alone understanding what it means to provide detailed information on consistency Therefore, it is mandatory that there are simple and easy-to-understand... also appear with distributed databases (OSZu and Valduriez, 1999) Again, the database can be distributed and replicated acrOSS ~\ number of servers that together form a cluster of servers, often referred to as ~\ shared-nothing architecture, emphasizing that neither disks nor main memory SEC 7.4 REPLICA MANAGEMENT 299 are shared by processors Alternatively, a database is distributed and possibly replicated... replicas that operate on a conit containing the data items x and y Both variables are assumed to have been initialized to O.Replica A received the operation 5, B : x ~x +2 from replica B and has made it permanent (i.e., the operation has been committed at A and cannot be rolled back) Replica A has three tentative update operations: 8,A, 12,A, and 14,A, which brings its ordering deviation to 3 Also note... that the numerical deviation, ordering deviation, and staleness should be limited to the values 4, 0, and 60 (seconds), respectively This can be interpreted as that there should be at most 4 unseen update operations at other replicas, there should be no tentative local updates, and the local copy of Q should have been checked for staleness no more than 60 seconds ago If these requirements are not fulfilled,... in Fig 7- 15( b), are performed on a copy that is consistent with the one just read at L 1 • We will return to client-centric consistency models when we discuss implementations later on in this chapter 296 CONSISTENCY AND REPLICA nON CHAP 7 7.4 REPLICA MANAGEMENT A key issue for any distributed system that supports replication is to decide where, when, and by whom replicas should be placed, and subsequently... CONSISTENCY AND REPLICATION CHAP 7 experiments show that computing the 20 best replica locations for a collection of 64,000 nodes is approximately 50 .000 times faster As a consequence, replicaserver placement can now be done in real time 7.4.2 Content Replication and Placement Let us now move away from server placement and concentrate on content placement When it comes to content replication and placement,... first been changed to b, and later to a On the other hand, P 4 will conclude that the final value is b To make the notion of sequential consistency more concrete, consider three concurrently-executing processes PI, P2, and P3, shown in Fig 7-6 (Dubois et aI., 1988) The data items in this example are formed by the three integer variables x, y, and z, which are stored in a (possibly distributed) shared sequentially . an ELECTION message and and each of them starts circulating its mes- sage, independent of the other one. Eventually, both messages will go all the way around, and both 2 and 5 will convert them. al. (20 05) . In the case of DHT-based systems, the basic idea is to reserve a fraction of the identifier space for superpeers. Recall that in DHT-based systems each node receives a random and uniformly. the issue of how processes in distributed systems synchronize. Synchronization is all about doing the right thing at the right time. A problem in distributed systems, and computer networks in general,

Ngày đăng: 08/08/2014, 21:22

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan