THE FRACTAL STRUCTURE OF DATA REFERENCE- P13 pps

5 353 1
THE FRACTAL STRUCTURE OF DATA REFERENCE- P13 pps

Đang tải... (xem toàn văn)

Thông tin tài liệu

46 the average residency time. The adjusted hit ratios are obtained by applying the proportionality relationship expressed by (1.19), with θ = .25; for example, the miss ratio for the ERP application is projected as .36 x (262/197) 25 = .34. Based upon the projected residency times and hit ratios, we may then compute the cache storage requirements, as already discussed for the previous table. In this way, we obtain an objective of 2256 megabytes for the cache size of the target system. If these requirements could be met exactly, then we would project an aggregate hit ratio, for the three applications, of 72 percent. THE FRACTAL STRUCTURE OF DATA REFERENCE As a final step, we must also consider how to round off the computed cache memory requirement of 2256 megabytes. Since this requirement is very close to 2 gigabytes, we might choose, in this case, to round down rather than up. Alternatively, it would also be reasonable to round up to, say, 3 gigabytes, on the grounds that the additional cache can be used for growth in the workload. To account for the rounding off of cache memory, we can apply the proportionality expressed by (1.23). Thus, after rounding down to 2 gigabytes, we would expect the aggregate miss ratio of the target system to be .28 x (2048/2256) 25/.75 = .29, identical to the current aggregate miss ratio of the three applications. If the available performance reporting tools are sufficiently complete, it is possible to refine the methods presented in the preceding example. In the figures of the example, the stage size was assumed to be equal to .04 megabytes (a reasonable approximation for most OS/390 workloads). The capability to support direct measurements of this quantity has recently been incorporated into some storage controls; if such measurements are supported, they can be found in the System Measurement Facility ( SMF) record type 74, subtype 5. Also, in the example, we used the total miss ratio as our measure of the percentage of I/O’s that require more cache memory to be allocated. A loophole exists in this technique, however, due to the capability of most current storage controls to accept write requests without needing to wait for a stage to occur. In a storage control of this type, virtually all write requests will typically be reported as “hits,” even though some of them may require allocation of memory. For database I/O, this potential source of error is usually not important, since write requests tend to be updates of data already in cache. If, however, it is desired to account for any write “hits” that may nevertheless require allocation of cache memory, counts of these can also be found in SMF record type 74, subtype 5 (they are called write promotions). Finally, we assumed the guestimate θ = .25. If measurements of the single - reference residency time are available, then θ can be quantified more precisely using (1.16). Use of Memo y by Multiple Workloads 47 2. ANALYSIS OF THE WORKlNG HYPOTHESIS It is beyond the scope of the present chapter to analyze rigorously every potential source oferror in a capacity planning exercise ofthe typejustpresented in the previous section, nor does a “back of the envelope” approximation method require this. Instead, we now focus on the following claim, central to applications of the working hypothesis: that it makes very little difference in the estimated hit ratio of the cache as a whole, whether the individual workloads within the cache are modeled with their correct average residency times, or whether they all are modeled assuming a common average residency time reflecting the conditions for the cache as a whole. Obviously, such a statement cannot hold in all cases. Instead, it is a statement about the realistic impact of typical variations between workloads. As the data presented in Chapter 1 suggests, the values of the parameter θ, for distinct workloads within a given environment, often vary over a fairly narrow range. This gives the proposed hypothesis an important head start, since the hypothesis would be exactly correct for a cache in which several workloads share the same value of the parameter θ. In that case, the common value of θ, together with the fact that all the workloads must share a common single reference residency time τ would then imply, by (1.12), that the workloads must also the same average residency time as well. Consider, now, a cache whose activity can be described by the multiple workload hierarchical reuse model; that is, the cache provides service to n individual workloads, i = 1, 2, . . . , n, each of which can be described by the hierarchical reuse model. The true miss ratio of the cache as a whole is the weighted average of the individual workload miss ratios, weighted by I/O rate: (3.2) We must now consider the error that results from replacing the correct miss ratio of each workload by the corresponding estimate m ^ i , calculated using the average residency time of the cache as a whole. Using the proportionality relationship expressed by (1.19), the values m ^ i can be written as (3.3) Thus, the working hypothesis implies an overall miss ratio of (3.4) 48 THE FRACTAL STRUCTURE OF DATA REFERENCE To investigate the errors implied by this calculation, we write it in the alternative form were we define (3.5) This expression for m ^ can be expanded by applying the binomial theorem: (3.6) where the “little-o” notation indicates terms higher than second order. Using (1.16), we define to be the aggregate value of 8 for the cache as a whole. Note, as a result, that in addition to the definition already given, ζ i also has the equivalent definition – (3.7) where we have applied (1.12) and taken advantage of the fact that each workload must share the same, common value oft. By applying (1.16)), we may rewrite the first - order terms of (3.6) as follows: (3.8) But since each miss corresponds to a cache visit, the aggregate residency time is computed over misses; that is, (3.9) Use of Memory by Multiple Workloads 49 so and (3.8) reduces to Combining (3.2), (3.6), and (3.10), we now have (3.1 1) Thus, m ^ = m except for second - order and higher terms. In a region sufficiently close to θ 1 = θ 2 = = θ n = θ (or equivalently, T 1 = T 2 = . . . = T n = T), the second - order and higher terms of (3.11) can be approximated as uniformly zero. The region where these second - order terms have at most a minor impact is that in which |ζ i |<<1 for i = 1, 2, . . . , n. This requirement permits wide variations in the workloads sharing the cache. For example, suppose that there are two workloads i = 1, 2, with values θ i equal to .1 and .3 respectively; and suppose that these workloads share a cache in which, overall, we have θ = .2. Then the absolute value of ζ i is no greater than .1/.7 = .14 for either workload. As a result, the absolute value of either of the second order summation terms of (3.11), calculated without the summation weights r i m i / rm, does not exceed .02. But the summation of these terms, multiplied by the weights r i m i /rm, is merely a weighted average; so in the case of the example, the quantity just stated is the largest relative error, in either direction, that can be made by neglecting the second order terms (i.e. the error can be no larger than 2 percent of m). Since the second order terms are so relatively insignificant, we may conclude that the third - order and higher terms, shown as o(ζ i 2 ) must be vanishingly small. 50 THE FRACTAL STRUCTURE OF DATA REFERENCE This chapter’s working hypothesis has also proved itself in actual empirical use, without recourse to formal error analysis [22]. Its practical success con - firms that the first - order approximation just obtained remains accurate within a wide enough range of conditions to make it an important practical tool. . for the cache size of the target system. If these requirements could be met exactly, then we would project an aggregate hit ratio, for the three applications, of 72 percent. THE FRACTAL STRUCTURE. we have θ = .2. Then the absolute value of ζ i is no greater than .1/.7 = .14 for either workload. As a result, the absolute value of either of the second order summation terms of (3.11), calculated. now focus on the following claim, central to applications of the working hypothesis: that it makes very little difference in the estimated hit ratio of the cache as a whole, whether the individual

Ngày đăng: 03/07/2014, 08:20

Từ khóa liên quan

Mục lục

  • front-matter.pdf

  • fulltext.pdf

  • fulltext_001.pdf

  • fulltext_002.pdf

  • fulltext_003.pdf

  • fulltext_004.pdf

  • fulltext_005.pdf

  • fulltext_006.pdf

  • fulltext_007.pdf

  • fulltext_008.pdf

  • back-matter.pdf

Tài liệu cùng người dùng

Tài liệu liên quan