neural networks algorithms applications and programming techniques phần 6 potx

194 Simulated Annealing Co-occurence A-B A-C A-D A-E B-C B-D B-E C-D C-E D-E Low memory High memory (a) (b) Figure 5.5 The CO-OCCURRENCE array is shown (a) depicted as a sequential memory array, and (b) with its mapping to the connections in the Boltzmann network. Figure 5.5(a). The first four entries in the CO-OCCURRENCE array for this network would be mapped to the connections between units A-B, A-C, A- D, and A-E, as shown in Figure 5.5(b). Likewise, the next three slots would be mapped to the connections between units B-C, B-D, and B-E; the next two to C-D, and C- E; and the last one to D-E. By using the arrays in this manner, we can collect co-occurrence statistics about the network by starting at the first input unit and sequentially scanning all other units in the network. After completing this initial pass, we can complete the network scan by merely incrementing our array pointer to access the second unit, then the third, fourth, , nth units. We can now specify the remaining data structures needed to implement the Boltzmann network simulator. We begin by defining the top-level record structure used to define the Boltzmann network: record BOLTZMANN = UNITS : integer; CLAMPED : boolean; INPUTS : DEDICATED; OUTPUTS : DEDICATED; NODES : "layer; TEMPERATURE : float; CURRENT : integer; {number of units in network} {true=clamped; false=unclamped} {locate and size network input} (locate and size network output} {pointer to layer structure) {current network temperature} {step in annealing schedule) 5.3 The Boltzmann Simulator 195 ANNEALING : ~SCHEDULE[]; {pointer to user-defined schedule} STATISTICS : COOCCURRENCE; {pointers to statistics arrays} end record; Figure 5.6 provides an illustration of how the values in the BOLTZMANN structure interact to specify a Boltzmann network. Here, as in other network models, the layer structure is the gateway to the network specific data structures. All that is needed to gain access to the layer-specific data are pointers to the appropriate arrays. Thus, the structure for the layer record is given by cuts weights 1 Boltzmann Nodes Annealing Current . Figure 5.6 Organization of the Boltzmann network using the defined data structure is shown. In this example, the input and output units are the same, and the network is in the third step of its annealing schedule. 196 Simulated Annealing record LAYER = outs : ~float[]; {pointer to unit outputs array} weights : ~"float[]; {pointers in weight _ptr array} end record; where out s is a pointer used to locate the beginning of the unit outputs array in memory, and weights is a pointer to the intermediate weight-ptr array, which is used in turn to locate each of the input connection arrays in the system. Since the Boltzmann network requires only one layer of PEs, we will need only one layer pointer in the BOLTZMANN record. All these low-level data structures are exactly the same as those specified in the generic simulator discussed in Chapter 1. 5.3.3 Boltzmann Algorithms Let us assume, for now, that our network data structures contain valid weights for all the connections, and that the user has initialized the annealing schedule to contain the information given in Table 5.1; in other words, the network data structures represent a trained network. We must now create the programs that the host computer will execute to simulate the network in production mode. We shall start by developing the information-recall routines. Boltzmann Production Algorithms. Remember that information recall in the Boltzmann network consists of a sequence of steps where we first apply an input to the network, raise the temperature to some predefined level, and anneal the network while slowly lowering the temperature. In this example, we would initially raise the temperature to 5 and would perform four stochastic signal propagations; we would then lower the temperature to 4 and would perform six signal propagations, and so on. After completing the four required signal propagations when the temperature of the network is 1, we can consider the network annealed. At this point, we simply read the output values from the visible units. If, however, we think about the process we just described, we can decom- pose the information-recall problem into three lower-level subroutines: Temperature 5 4 3 2 1 Passes 4 6 7 6 4 Table 5.1 The annealing schedule for the simulator example. L 5.3 The Boltzmann Simulator 197 apply_input A routine used to take a user-provided or training input and apply it to the network, and to initialize the output from all unknown units to a random state, anneal A routine used to stimulate the Boltzmann network according to the previously initialized annealing schedule, get-output A function used to locate the start of the output array in the computer memory, so that the network response can be accessed. Since the anneal routine is the place where most of the processing is accomplished, we shall concentrate on the development of just that routine, leaving the design of the other two algorithms to you. The mathematics of the Boltzmann network tells us that the annealing process, in production mode, consists of two major functions that are repeated until the network has stabilized at a low temperature. These functions, described next, can each be implemented as subroutines that are called by the parent anneal process. set_temp A procedure used to set the current network temperature and annealing schedule pass count to values specified in the overall annealing schedule. propagate A function used to perform one signal propagation through the entire network, using the current temperature and probabilistic unit selection. This routine should be capable of performing the signal propagation regardless of the network state (clamped or undamped). Signal Propagation in the Boltzmann Network. We shall now define the most basic of the needed subroutines, the propagate procedure. The algorithm for this procedure, which follows, presumes that the user-provided apply-input and not-yet-defined set-temp functions have been executed to initialize the outputs of the network's units and temperature parameter to the desired states. procedure propagate (NET:BOLTZMANN) {perform one signal propagation pass through network.} var unit : integer; {randomly selected unit} p : float; {probability of unit being on} neti : float; {net input to unit} threshold : integer; {point at which unit turns on} i, j : integer; {iteration counters} inputs : "float[]; {pointer to unit outputs array} connects : ~float[]; {pointer to unit weights array} undamped : integer; {index to first undamped unit} firstc : integer; (index to first connection} begin {locate the first nonvisible unit, assuming first index = 1} undamped = NET .OUTPUT. FIRST + NET . OUTPUT . LENGTH - 1; 198 Simulated Annealing if (NET.INPUTS.FIRST = NET.OUTPUTS.FIRST) then firstc = NET.INPUTS.FIRST {Boltzmann completion} else firstc = NET.INPUTS.LENGTH + 1; {Boltzmann input-output} end if; for i = 1 to NET.UNITS {for as many units in network} do if (NET.CLAMPED) {if network is clamped} then {select an undamped unit} unit = random (NET. UNITS - undamped) + undamped; else {select any unit} unit = random(NET.UNITS); end if; neti =0; {initialize input} inputs = NET.NODES".CUTS; {locate inputs} connects = NET.NODES".WEIGHTS[i]'; {and connections} for j = firstc to NET.UNITS {all connections to unit} do {compute sum of products} neti = neti + inputs[j] * connects[j]; end do; {this next statement is used to improve performance, as described in the text} if (NET.INPUTS.FIRST = NET.OUTPUTS.FIRST) or (i >= firstc) then neti = neti - inputs[i] * connects[i]; {no connection} end if; p = 1.0 / (1.0+ exp(-neti / NET.TEMPERATURE)); threshold = round (p * 10000); {convert to integer} if (random(lOOOO) <= threshold)) {should unit be on?} then inputs[unit] =1; {if so, set to 1} else inputs[unit] = 0; {otherwise, set to 0} end if; end do; end procedure; 5.3 The Boltzmann Simulator 199 Before we move on to the next routine, there are three aspects of the propagate procedure that bear further discussion: the selection mechanism for unit update, the computation of the neti term, and the method we have chosen for determining when a unit is or is not active. In the first case, the Boltzmann network must be able to run with its inputs either clamped or free-running. So that we do not need to have different propagate routines for each mode, we simply use a Boolean variable in the network record to indicate the current mode of operation, and enable the propagate routine to select a unit for update accordingly. If the network is clamped, we cannot select an input or output unit for update. We account for these differences by assuming that the visible units to the network are the first TV units in the layer. We thus can be assured that the visible units will not change if we simply select a random unit from the set of units that do not include the first N units. We accomplish this selection by decreasing the range of the random-number generator to the number of network units minus N, and then adding TV to the result. Since we have decided that all our arrays will use the first TV indices to locate the visible units, generating a random index greater than TV will always select a random unit beyond the range of the visible units. However, if the network is undamped, any unit must be available for update. Inspection of the algorithm for propagate will reveal that these two cases are handled by the if-then-else clause at the beginning of the routine. Second, there are two salient points regarding the computation of the neti term with respect to the propagate routine. The first point is that connections between input units are processed only when the network is configured as a Boltzmann completion network. In the Boltzmann input-output mode, connections between input units do not exist. This structure conforms to the mathematical model described earlier. The second point about the calculation of the neti term is that we have obviously wasted computer time by processing a connection from each unit to itself twice, once as part of the summation loop during the calculation of the neti value, and once to subtract it out after the total neti has been calculated. The reason we have chosen to implement the algorithm in this manner is, again, to improve performance. Even though we have consumed computer time by processing a nonexistent connection for every unit in the network, we have used far less time than would be required to disallow the computation of the missing connection selectively during every iteration of the summation loop. Furthermore, we can easily eliminate the error introduced in the input summation by processing the nonexistent connection by subtracting out just that term after completing the loop, prior to updating the output of the unit. You might also observe that we have wasted memory by allocating space for the connections between each unit and itself. We have chosen to implement the network in this fashion to simplify processing, and thus to improve performance as described. As an example of why it is desirable to optimize the code at the expense of wasted memory, consider the alternative case where only valid connections 200 Simulated Annealing are modeled. Since no unit has a connection to itself, but all units have outputs maintained in the same array, the code to process all input connections to a unit would have to be written as two different loops: one for those input PEs that precede the current unit, where the array indices for outputs and connections correspond one-to-one, and one loop for inputs from units that follow, where unit outputs are displaced by one array entry from the corresponding connection. This situation occurs because we have organized the unit outputs and connections as linearly sequential arrays in memory. Such a situation is illustrated in Figure 5.7. (a) outputs weights (input) w (i-l)j outputs x weights (input) (b) w (i+2)j Figure 5.7 The illustration shows array processing (a) when memory is allocated for all possible connections, and (b) when memory is not allocated for intra-unit connections. In (a), the code necessary to perform this input summation simply computes the input value for all connections, then eliminates the error introduced by processing the nonexistent connection to itself. In (b), the code must be more selective about accessing connections, since the one-to-one mapping of connections to units is lost. Obviously, approach (a) is our preferred method, since it will execute much faster than approach (b). 5.3 The Boltzmann Simulator 201 Finally, with respect to deciding when to activate the output of a unit, recall that the Boltzmann network differs from the other networks that we have studied in that PEs are activated stochastically rather than deterministically. Recall that the equation defines how we calculate the probability that a unit x/t is active with respect to its input stimulation (net,t). However, simply knowing the probability that a unit will generate an output does not guarantee that the unit will generate an output. We must therefore implement a mechanism that allows the computer to translate the calculated probability into a unit output that occurs with the same probability; in effect, we must let the computer roll the dice to determine when an output is active and when it is not. One method for doing this is to make use of the pseudorandom-number generator available in most high-level computer languages. Here, we take advantage of the fact that the computed probability, p^, will always be a fractional number ranging between zero and one, as illustrated by the graph depicted in Figure 5.8. We can map p/, to an integer threshold value between zero and some arbitrarily large number by simply multiplying the ceiling value by the computed probability and rounding the result into an integer. We then generate a random number between zero and the selected ceiling, and, if the probability does not exceed the threshold value just computed, the output of the unit is set to one. Assuming that the pseudorandom-number generator has a uniform probability distribution across the interval of interest, the random number produced will not exceed the threshold value with a probability equal to the specified value, pk. Thus, we now have a means of stochastically activating unit outputs in the network. -20 -15 -10 0 Net input Figure 5.8 Shown here is a graph of the probability, p k , that the /cth unit is on at five different temperatures, T. 202 Simulated Annealing Boltzmann Learning Algorithms. There are five additional functions that must be defined to train the Boltzmann network: set_temp A function used to update the parameters in the BOLTZMANN record to reflect the network temperature at the current step, as specified in the annealing schedule. pplus A function used to compute and average the co-occurrence probabilities for a network with clamped inputs after it has reached equilibrium at the minimum temperature. pminus A function similar to pplus, but used when the network is free- running. update_connections The procedure that modifies the connection weights in the network to train the Boltzmann simulator. The implementation of the set-temp function is straightforward, as defined here: procedure set_temp (NET:BOLTZMANN; N:integer) {set the temperature and schedule step} begin NET.CURRENT = N; {set current step} NET.TEMPERATURE = NET.ANNEALING".STEP[N].TEMPERATURE; end procedure; On the other hand, the estimation of the pj and p~ terms is complex, and each must be accomplished in two steps: in the first, statistics about the co-occurrence between network units must be gathered and averaged for each training pattern; in the second, the statistics across all training patterns are collected. This separation provides a natural breakpoint for algorithm development. We can therefore define two algorithms, sum_cooccurrence and pplus, that respectively address the two steps identified. We shall now turn our attention to the computation of the co-occurrence probability, pj, when the input to the network is clamped to an arbitrary input vector, V a . As we did with propagate, we will assume that the input pattern has been placed on the input units by an earlier call to set-inputs. Fur- thermore, we shall assume that the statistics arrays have been initialized by an earlier call to a user-supplied routine that we refer to as zero-Statistics. procedure sum_cooccurrence (NET:BOLTZMANN) {accumulate co-occurence statistics for the specified network} var i,j,k : integer; {loop counters} connect : integer; {co-occurence index} stats : "float[]; {pointer to statistics array} 5.3 The Boltzmann Simulator 203 begin if (NET.CLAMPED) {if network is clamped} then stats = NET.STATISTICS.CLAMPED else stats = NET.STATISTICS.UNCLAMPED; end if; for i = 1 to 5 {arbitrary number of cycles} do propagate (NET); {run the network once} connect = 1; {start at first pair} for j = 1 to NET.UNITS {for all units in network} do if (NET.NODES.OUTS"[j] = 1) {if unit is on} then for k = j to NET.UNITS {for rest of units} do if (NET.NODES.OUTS~[k] = 1) then stats"[connect] = stats"[connect] + 1; end if; connect = next (connect); end do; else {skip to next unit connect} connect = connect + (NET.UNITS - j); end if; end do; end do; end procedure; Notice that the sum_cooccurrence routine does not average the accu- mulated results after completing the examination. We delay this computation to the pplus routine so that we can continue to use the clamped array to collect statistics across all patterns. If we averaged the results after each cycle, we would be forced to maintain different arrays for each pattern, thus increasing the need for storage at a near-exponential rate. In addition, note that, by using a pointer to the appropriate statistics array, we have generalized the routine so that it may be used to collect statistics for the network in either the clamped or undamped modes of operation. Before we define the algorithm needed to estimate the pt. term for the Boltzmann network, we will make a few assumptions. Since the total number of training patterns that the network must learn will depend on the application, we [...]... nonzero Instead of Eq (6. 8), we can use as the learning law, w = (-cw + dl)[/(net) where [/(net) = 1 net > 0 0 net = 0 (6. 10) The Counterpropagation Network 222 0. 866 / ' Initial w = (0.5,0. 866 ) / 0.5 Time step, t Figure 6. 7 Given an input vector I = (0,1) and an initial weight vector, w(0) = (0.5,0. 866 ), the components, w\ and wi, of the weight vector evolve in time according to Eq (6. 9), as shown in the... [4] Geoffrey E Hinton Learning in parallel networks Byte, 10(4): 265 -273, April 1985 [5] S Kirkpatrick, Jr., C D Gelatt, and M P Vecchi Optimization by simulated annealing In James A Anderson and Edward Rosenfeld, editors, Neurocomputing MIT Press, Cambridge, MA, pages 554- 568 , 1988 Reprinted from Science 220: 67 1 -68 0, 1983 [6] F Reif Fundamentals of Statistical and Thermal Physics McGraw-Hill series in... Geoffrey E Hinton, and Terrence J Sejnowski A learning algorithm for Boltzmann machines In James A Anderson and Edward Rosenfeld, editors, Neurocomputing MIT Press, Cambridge, MA, pages 63 8 -65 0, 1988 Reprinted from Cognitive Science 9:147- 169 , 1985 [2] Stuart Geman and Donald Geman Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images In James A Anderson and Edward Rosenfeld,... Cambridge, MA, pages 61 4 -63 4, 1988 Reprinted from IEEE Transactions of Pattern Analysis and Machine Intelligence PAMI -6: 721-741, 1984 [3] G E Hinton and T J Sejnowski Learning and relearning in Boltzmann machines In David E Rumelhart and James L McClelland, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition MIT Press, Cambridge, MA, pages 282-317, 19 86 [4] Geoffrey... instar is governed by the equation (6. 4) y = —ay + b net where a, b > 0 The dynamic behavior of y is illustrated in Figure 6. 5 We can solve Eq (6. 4) to get the output as a function of time Assuming the initial output is zero, and that a nonzero input vector is present from time t = 0 until time t, y(t)=-net(l~e~"t} a \ / (6. 5) The equilibrium value of y(t) is if* = -net (6. 6) If the input vector is removed... CW •dly (6. 8) where y is the output of the instar, and c, d > 0 Notice the relationship between Eq (6. 8) and the Hebbian learning rule discussed in Chapter 1 The second term on the right side of Eq (6. 8) contains the product of the input and the output of a processing element Thus, when both are large, the weight on the input connection is reinforced, as predicted by Hebb's theory Equation 6. 8 is difficult... — (a /6) net Because net = w • I, Eq (6. 8) becomes w == —cw dl(w-I) (6. 9) where we have absorbed the factor a /6 into the constant d Although Eq (6. 9) is still not directly solvable, the assumption that changes to weights occur more slowly than do changes to other parameters is important We shall see more of the utility of such an assumption in Chapter 8 Figure 6. 7 illustrates the solution to Eq (6. 9)... competitive network, and a structure known as an outstar In Section 6. 2, we shall return to the discussion of the CPN The Counterpropagation Network 2 16 6.1.1 The Input Layer Discussions of neural networks often ignore the input-layer processing elements, or consider them simply as pass-through units, responsible only for distributing input data to other processing elements Computer simulations of networks usually... inhibitory signals to the other units and receives the largest positive feedback from itself Eq (6. 13) by replacing every occurrence of Ij in Eq (6. 2) with f ( x j ) + net,, for all j The relationship between the constants A and B, and the form of the function, f ( x j ) , determine how the solutions to Eq (6. 13) evolve in time We shall now look at specific cases Equation (6. 13) is somewhat easier to analyze... variables, X-t — Xi/^,kxk, and one that describes the total pattern intensity, x = ^^Xk- First, rearrange Eq (6. 13) as follows: net;] - x% Next, sum over i to get (6. 14) x = -Ax + (B - x) Now substitute xXi into Eq (6. 13) and use Eq (6. 14) to simplify the result If we make the definition g(w) = w~] f ( w ) , then we get xXi = BxXi^2xlc[3(xXi)-g(xXk)]+B(l-Xiynsti-BXiJ2nestk (6. 15) We can now evaluate . Cambridge, MA, pages 61 4 -63 4, 1988. Reprinted from IEEE Transactions of Pattern Analysis and Machine Intelligence PAMI -6: 721-741, 1984. [3] G. E. Hinton and T. J. Sejnowski. Learning and relearning . 554- 568 , 1988. Reprinted from Science 220: 67 1 -68 0, 1983. [6] F. Reif. Fundamentals of Statistical and Thermal Physics. McGraw-Hill series in fundamental physics. McGraw-Hill, New York, 1 965 . [7]. Hinton, and Terrence J. Sejnowski. A learning algorithm for Boltzmann machines. In James A. Anderson and Edward Rosenfeld, editors, Neurocomputing. MIT Press, Cambridge, MA, pages 63 8 -65 0,

neural networks algorithms applications and programming techniques phần 6 potx

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan